A switch method of model inference serving oriented to serverless computing

Computer Engineering & Science ›› 2024, Vol. 46 ›› Issue (07): 1210-1217.

• High Performance Computing • Previous Articles Next Articles

A switch method of model inference serving oriented to serverless computing

WEN Xin,ZENG Tao,LI Chun-bo,XU Zi-chen

（School of Mathematics and Computer Science,Nanchang University,Nanchang 330031,China）

Received:2023-10-12 Revised:2023-11-21 Accepted:2024-07-25 Online:2024-07-25 Published:2024-07-18

Abstract

Abstract: The development of large-scale models has led to the widespread application of model inference services. Constructing a stable and reliable architectural support for model inference services has become a focus for cloud service providers. Serverless computing is a cloud service computing paradigm with fine-grained resource granularity and high abstraction level. It offers advantages such as on-demand billing and elastic scalability, which can effectively improve the computational efficiency of model inference services. However, the multi-stage nature of model inference service workflows makes it challenging for independent serverless computing frameworks to ensure optimal execution of each stage. Therefore, the key problem to be addressed is how to leverage the performance characteristics of different serverless computing frameworks to achieve online switching of model inference service workflows and reduce the overall execution time. This paper discusses the switching problem of model inference ser- vices on different serverless computing frameworks. Firstly, a pre-trained model is used to construct model inference service functions and derive the performance characteristics of heterogeneous serverless computing frameworks. Secondly, a machine learning technique is employed to build a binary classification model that combines the performance characteristics of heterogeneous serverless computing frameworks, enabling online switching of the model inference service framework. Finally, a testing platform is established to generate model inference service workflows and evaluate the performance of the online switching framework prototype. Preliminary experimental results indicate that compared with the independent serverless computing framework, the online switching framework prototype can reduce the execution time of model inference service workflows by up to 57％.

Key words: model inference service, serverless computing, machine learning

WEN Xin, ZENG Tao, LI Chun-bo, XU Zi-chen. A switch method of model inference serving oriented to serverless computing[J]. Computer Engineering & Science, 2024, 46(07): 1210-1217.

[1]	GAO Shan, LI Shi-jie, CAI Zhi-ping. A survey of Chinese text classification based on deep learning [J]. Computer Engineering & Science, 2024, 46(04): 684-692.
[2]	PENG Chang, LIU Qing-zhi, CHEN Chang-bo, . Loop permutation and auto-tuning under polyhedral model [J]. Computer Engineering & Science, 2023, 45(12): 2121-2134.
[3]	ZHAO Zhen-yu, YANG Tian-hao, JIANG Wen-cheng, ZHANG Shu-zheng. A machine learning-based fast calculation method of multi-voltage, multi-temperature and multi-parameter standard cell delay [J]. Computer Engineering & Science, 2023, 45(08): 1331-1338.
[4]	LI Xiao-ling, FANG Jian-bin, MA Jun, TAN Shuang, TAN Yu-song. Automated task allocation of sparse matrix computation based on supervised learning [J]. Computer Engineering & Science, 2023, 45(05): 782-789.
[5]	TANG Yang-kun, XIAN Gang, YANG Wen-xiang, YU Jie, ZHANG Xiao-rong, WANG Yao-bin. Job failure prediction based on user behavior on supercomputers [J]. Computer Engineering & Science, 2022, 44(10): 1753-1761.
[6]	YANG Bo-ai, ZHAO Shan, LIU Fang. A survey on serverless computing [J]. Computer Engineering & Science, 2022, 44(04): 611-619.

A switch method of model inference serving oriented to serverless computing

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 6

Recommended Articles

Metrics

Comments