面向服务器无感计算的模型推理服务切换方法研究

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (07): 1210-1217.

面向服务器无感计算的模型推理服务切换方法研究

温鑫,曾焘,李春波,徐子晨

(南昌大学数学与计算机学院，江西南昌 330031)

收稿日期:2023-10-12 修回日期:2023-11-21 接受日期:2024-07-25 出版日期:2024-07-25 发布日期:2024-07-18
基金资助:
国家重点研发计划(2022YFB4501703);江西省科技厅重点研发计划(20212BBE53004);南昌大学江西省财政科技专项(ZBG20230418043);江西省研究生创新基金(YC2023-B010)

A switch method of model inference serving oriented to serverless computing

WEN Xin,ZENG Tao,LI Chun-bo,XU Zi-chen

（School of Mathematics and Computer Science,Nanchang University,Nanchang 330031,China）

Received:2023-10-12 Revised:2023-11-21 Accepted:2024-07-25 Online:2024-07-25 Published:2024-07-18

摘要/Abstract

摘要： 模型推理服务正随着大模型技术的发展被广泛应用，为模型推理服务构建稳定可靠的体系结构支撑逐渐成为云服务商关注的焦点。服务器无感计算是一种资源粒度细、抽象程度高的云服务计算范式，具有按需计费、弹性扩展等优势，能够有效提高模型推理服务的计算效率。但是，模型推理服务工作流呈现出多阶段的特点，独立的服务器无感计算框架难以确保模型推理服务工作流各阶段的最优执行。因此，如何利用不同服务器无感计算框架的性能特征，实现模型推理服务工作流各阶段的在线切换，缩短整体工作流的执行时间，是亟待解决的关键问题。讨论模型推理服务在不同服务器无感计算框架上的切换问题。首先，使用预训练模型构建模型推理服务函数，得出异构服务器无感计算框架的性能特征；其次，采用机器学习技术构建二分类模型，结合异构服务器无感计算框架的性能特征，实现模型推理服务在线切换框架原型；最后，搭建测试平台，生成模型推理服务工作流，完成在线切换框架原型的性能评估。初步实验结果表明，在线切换框架原型与独立的服务器无感计算框架相比，最大可缩短模型推理服务工作流57%的执行时间。

关键词: 模型推理服务, 服务器无感计算, 机器学习

Abstract: The development of large-scale models has led to the widespread application of model inference services. Constructing a stable and reliable architectural support for model inference services has become a focus for cloud service providers. Serverless computing is a cloud service computing paradigm with fine-grained resource granularity and high abstraction level. It offers advantages such as on-demand billing and elastic scalability, which can effectively improve the computational efficiency of model inference services. However, the multi-stage nature of model inference service workflows makes it challenging for independent serverless computing frameworks to ensure optimal execution of each stage. Therefore, the key problem to be addressed is how to leverage the performance characteristics of different serverless computing frameworks to achieve online switching of model inference service workflows and reduce the overall execution time. This paper discusses the switching problem of model inference ser- vices on different serverless computing frameworks. Firstly, a pre-trained model is used to construct model inference service functions and derive the performance characteristics of heterogeneous serverless computing frameworks. Secondly, a machine learning technique is employed to build a binary classification model that combines the performance characteristics of heterogeneous serverless computing frameworks, enabling online switching of the model inference service framework. Finally, a testing platform is established to generate model inference service workflows and evaluate the performance of the online switching framework prototype. Preliminary experimental results indicate that compared with the independent serverless computing framework, the online switching framework prototype can reduce the execution time of model inference service workflows by up to 57％.

Key words: model inference service, serverless computing, machine learning

温鑫, 曾焘, 李春波, 徐子晨. 面向服务器无感计算的模型推理服务切换方法研究[J]. 计算机工程与科学, 2024, 46(07): 1210-1217.

WEN Xin, ZENG Tao, LI Chun-bo, XU Zi-chen. A switch method of model inference serving oriented to serverless computing[J]. Computer Engineering & Science, 2024, 46(07): 1210-1217.

编辑推荐

Metrics

阅读次数

全文

545

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	545

来源	本网站	其他网站

次数	417	128
比例	77%	23%

摘要

164

最新录用	在线预览	正式出版

0	0	164

	来源	本网站

	次数	164
	比例	100%

[1]	高珊, 李世杰, 蔡志平. 基于深度学习的中文文本分类综述[J]. 计算机工程与科学, 2024, 46(04): 684-692.
[2]	彭畅, 刘青枝, 陈长波, . 多面体模型下的循环置换与自动调优[J]. 计算机工程与科学, 2023, 45(12): 2121-2134.
[3]	赵振宇, 杨天豪, 蒋汶乘, 张书政. 基于机器学习的多压多温多参标准单元延迟快速计算方法[J]. 计算机工程与科学, 2023, 45(08): 1331-1338.
[4]	李小玲, 方建滨, 马俊, 谭霜, 谭郁松. 基于监督学习的稀疏矩阵自动任务分配[J]. 计算机工程与科学, 2023, 45(05): 782-789.
[5]	唐阳坤, 鲜港, 杨文祥, 喻杰, 张晓蓉, 王耀彬. 基于用户行为的超级计算机作业失败预测方法[J]. 计算机工程与科学, 2022, 44(10): 1753-1761.
[6]	李文丽. 基于朴素贝叶斯分类的网络谣言识别研究[J]. 计算机工程与科学, 2022, 44(03): 495-501.