• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

HPC cluster and low cost auto
scaling model based on public cloud

TIAN Yongjun,HE Wanqing,SUN Xiangzheng,YU Yang   

  1. (Alibaba Cloud Computing Co.Ltd.,Hangzhou 310024,China)
  • Received:2018-10-17 Revised:2018-12-21 Online:2019-07-25 Published:2019-07-25

Abstract:

For many HPC users, computing cost is one important factor for whether moving workloads to the public cloud. Alibaba cloud provides “preemptible instance”. It is an on-demand instance to reduce the cost of using public cloud computing resources. The market price of “preemptible instance” fluctuates and it can be as low as 10% of “pay as you go instance”. And “preemptible instance” cannot be kept as long as users’ requirement, and be released due to datacenter scheduler or some other reasons, so it can be used in some stateless scenarios. On the public cloud, based on users’ application types, job submission patterns, performance requirements, timing and cost, we propose an auto scaling strategy on the public cloud for general HPC cluster schedulers, which can automatically deploy computing resources and control cost. HPC users only pay for what they want and what they use. Due to abundant resource types and resource rent models, and taking advantages of auto scaling service, “preemptible instance” and application checkpoint/restart, we can supply a low cost auto scaling model. When users submit jobs, they can set their expectation cost, and the auto scaling service will find the “preemptible instance” under this cost setting, and use checkpoint/restart technique to keep job running during computing resource exchanging. Finally, we verify the feasibility and effectiveness of our solution through LAMMPS and GROMACS applications.

Key words: high performance computing, public cloud, auto scaling, checkpoint/restart, low cost scaling model