J4 ›› 2010, Vol. 32 ›› Issue (10): 38-43.doi: 10.3969/j.issn.1007130X.2010.
• 论文 • Previous Articles Next Articles
FU Hongyi,YANG Xuejun
Received:
Revised:
Online:
Published:
Abstract:
Fault tolerance is critical to computer systems. Recently,as the ever increasing complexity of architecture and the development of semiconductor techniques,the density of chips becomes much higher. As a consequence,the reliability issue of computer systems emerges,not only for largescale parallel systems,but also for distributed environments,even desktop applications. This paper reviews a number of typical faulttolerance techniques concerning hardware faults proposed in recent years,especially for those designed for largescale parallel systems,draws some preliminary conclusions,and puts forward several potential research topics of this domain.
Key words: largescale parallel computing;faulttolerance techique;reliability
FU Hongyi,YANG Xuejun. A Survey of the FaultTolerance Techniques for LargeScale Parallel Computing Systems[J]. J4, 2010, 32(10): 38-43.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://joces.nudt.edu.cn/EN/10.3969/j.issn.1007130X.2010.
http://joces.nudt.edu.cn/EN/Y2010/V32/I10/38