• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2020, Vol. 42 ›› Issue (11): 1973-1980.

• 高性能计算 • 上一篇    下一篇

面向HPC的函数计算冷启动优化

李哲,谭郁松,李宝,余杰   

  1. (国防科技大学计算机学院,湖南 长沙 410073)
  • 收稿日期:2020-06-11 修回日期:2020-07-21 接受日期:2020-11-25 出版日期:2020-11-25 发布日期:2020-11-30
  • 基金资助:
    国家重点研发计划(2018YFB1003602);国家自然科学基金(U19A2060)

Cold start optimization on function computing for high performance computing 

LI Zhe,TAN Yusong,LI Bao,YU Jie   

  1. (School of Computer,National University of Defense Technology,Changsha 410073,China)

  • Received:2020-06-11 Revised:2020-07-21 Accepted:2020-11-25 Online:2020-11-25 Published:2020-11-30

摘要: 高性能计算问题通常具有子任务并行化的特点,同时在执行过程中需要消耗大量计算资源。以虚拟机作为分布式节点的传统云计算已经被证明能够很好地处理一些常见的高性能计算问题,但分布式环境的管理和解决方案的分布式设计令处理过程变得较为复杂。函数计算是一种新的无服务器云计算范型,其自动扩容的特性和可观的计算资源恰好与高性能计算问题能够很好地结合,但函数计算自动扩容的特性带来的冷启动延迟却是函数计算平台上一个无法避免的问题,尤其是在执行高性能计算这一类存在高并发量作业的任务时,这种延迟会被进一步放大。首先分析一个高性能计算任务在冷启动和热启动情况下的完成时间,同时分析造成额外延迟的原因,然后结合时间序列分析工具和平台自身的扩容机制,提出一种预热方法,这种方法能够有效地降低高性能计算任务在函数计算平台上产生的冷启动延迟。


关键词: 高性能计算, 函数计算, 冷启动, 预热

Abstract: High performance computing problems usually have the characteristics of parallelization of subtasks, and a lot of computing resources are consumed in the process of execution. It has been proved that traditional cloud computing based on virtual machine can deal with such problems, but the management of distributed environment and the distributed design of solutions make the processing more complex. Function computing is a new type of serverless cloud computing paradigm, its automatic expansion and considerable computing resources can be well combined with HPC problems. However, the cold start delay is an unavoidable problem on the public cloud function computing platform, especially in the task of HPC problems having high concurrent jobs of which delay will be further magnified. In this paper, we first analyze the completion time of a simple HPC task under cold start and hot start conditions, and analyze the causes of additional delay. According to these analyses, we combine the time series ana lysis tools and the platform's automatic expansion mechanism to propose an effective preheating method, which can effectively reduce the cold start delay of HPC tasks on the function computing platform.

Key words: high performance computing, function computing, cold start, preheating