J4 ›› 2013, Vol. 35 ›› Issue (11): 54-61.
• 论文 • Previous Articles Next Articles
ZHANG Yi,CHEN Liang,PANG Jian
Received:
Revised:
Online:
Published:
Abstract:
With the increasing number and scale of high performance
computing cluster systems, the system maintenance becomes more difficult and the workload is
getting larger. The software system we introduce in the paper works in multiple Linux clusters
with different hardware and software environment, automatically monitors the important
operating states and indexes of clusters by command line scripts and programs, and sends
faults messages to the Windows terminal of system administrators in time by means of socket
communication. Results demonstrate that this system improves the efficiency of system
maintenance and speeds up the response time of faults handling. Using database, it also
records and manages faults event data, thus standardizing the process of faults handling.
Key words: cluster;fault;monitor;manage;database
ZHANG Yi,CHEN Liang,PANG Jian. Fault monitoring and management system for multiple computing clusters [J]. J4, 2013, 35(11): 54-61.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://joces.nudt.edu.cn/EN/
http://joces.nudt.edu.cn/EN/Y2013/V35/I11/54