• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2020, Vol. 42 ›› Issue (10高性能专刊): 1807-1814.

Previous Articles     Next Articles

Design and implementation of the maintenance and management platform powered by  Magic Cube-3 high-performance computer

ZHAO Qi-qi   

  1. (Shanghai Supercomputer Center,Shanghai 201203,China)
  • Received:2020-06-04 Revised:2020-07-15 Accepted:2020-10-25 Online:2020-10-25 Published:2020-10-23

Abstract:

With the progress of science and technology, high-performance computers, as important infrastructure for scientific research, have provided strong support for the development of various indu- stries. It is administrators’ wishes and responsibilities to guarantee that high-performance computers can operate stably and efficiently. This paper mainly introduces the maintenance and management system powered by “magic cube-3” supercomputer. The introduction includes platform structure design, underlying data collection interface and methods, and various functions achieved by the platform  including system monitoring, automatic detection and data analysis. This platform  enables administrators to directly know the operation status of computers and timely find and handle malfunction. Through collecting and analyzing data from multiple perspectives, administrators can find out bottlenecks that slow down the operation efficiency, thus offering scientific decision-making basis for subsequent optimization and upgrading.





Key words: high-performance computer, maintenance and management, system monitoring, data ana- lysis