• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2024, Vol. 46 ›› Issue (07): 1210-1217.

• High Performance Computing • Previous Articles     Next Articles

A switch method of model inference serving oriented to serverless computing

WEN Xin,ZENG Tao,LI Chun-bo,XU Zi-chen   

  1. (School of Mathematics and Computer Science,Nanchang University,Nanchang 330031,China)
  • Received:2023-10-12 Revised:2023-11-21 Accepted:2024-07-25 Online:2024-07-25 Published:2024-07-18

Abstract: The development of large-scale models has led to the widespread application of model inference services. Constructing a stable and reliable architectural support for model inference services has become a focus for cloud service providers. Serverless computing is a cloud service computing paradigm with fine-grained resource granularity and high abstraction level. It offers advantages such as on-demand billing and elastic scalability, which can effectively improve the computational efficiency of model inference services. However, the multi-stage nature of model inference service workflows makes it challenging for independent serverless computing frameworks to ensure optimal execution of each stage. Therefore, the key problem to be addressed is how to leverage the performance characteristics of different serverless computing frameworks to achieve online switching of model inference service workflows and reduce the overall execution time. This paper discusses the switching problem of model inference ser- vices on different serverless computing frameworks. Firstly, a pre-trained model is used to construct model inference service functions and derive the performance characteristics of heterogeneous serverless computing frameworks. Secondly, a machine learning technique is employed to build a binary classification model that combines the performance characteristics of heterogeneous serverless computing frameworks, enabling online switching of the model inference service framework. Finally, a testing platform is established to generate model inference service workflows and evaluate the performance of the online switching framework prototype. Preliminary experimental results indicate that compared with the independent serverless computing framework, the online switching framework prototype can reduce the execution time of model inference service workflows by up to 57%.

Key words: model inference service, serverless computing, machine learning