• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2022, Vol. 44 ›› Issue (04): 594-604.

• High Performance Computing • Previous Articles     Next Articles

An application log analysis and system automation optimization framework for Lustre cluster storage

CHENG Wen1,LI Yan2,ZENG Ling-fang3,WANG Fang1,TANG Shi-cheng2,YANG Li-ping2,FENG Dan1,ZENG Wen-jun2   

  1. (1.Wuhan National Laboratory for Optoelectronics,Huazhong University of Science and Technology,
    Key Laboratory of Information Storage System,Engineering Research Center of Data Storage Systems and Technology,
    Ministry of Education of China,Wuhan 430074;
    2.China National GeneBank,BGI-Shenzhen,Shenzhen 518120;
    3.Zhejiang Lab, Hangzhou 311121,China)
  • Received:2021-08-13 Revised:2021-11-11 Accepted:2022-04-25 Online:2022-04-25 Published:2022-04-20

Abstract: In the fields of scientific computing, big data processing, and artificial intelligence,  it is very important to study the relevant application load, analyze the load I/O pattern to reveal the application load change law, etc., which is very important to guide the performance optimization of the cluster storage system. At present, there are many kinds of applications and the applications are updated rapidly and iteratively. The complex environment makes the feature mining of application load full of challenges. To address the above problems, we collected the application log information of five Lustre cluster storages in the production environment for 326 days, explored and analyzed the access and load characteristics of the application load, and verified and supplemented the existing observations. Through horizontal, vertical, and multi-dimensional comparative analysis and information mining of the application log information, we summarize four findings, explore the relationship between the relevant findings and previous research work, and then combine the actual production environment with the corresponding system optimization strategies. Feasible implementation schemes are given, which provide relevant references and suggestions for users, maintainers, upper application developers, multi-tier storage system designers, and other personnel. At the same time, because of the complex practical application environment and time-consuming work of system optimization, a system automation optimization framework (SAOF) is designed and implemented. SAOF can provide functions such as resource reservation and bandwidth limitation for specified application loads. Preliminary tests show that SAOF can provide automatic QoS guarantees for different tasks according to system resources and task load requirements.


Key words: Lustre file system, log analysis, system optimization, quality of service(QoS), resource management