• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2011, Vol. 33 ›› Issue (11): 132-139.

• 论文 • 上一篇    下一篇

面向瞬态故障的软件容错技术

徐建军,谭庆平,熊荫乔,谭兰芳,李建立   

  1. (国防科学技术大学计算机学院,湖南 长沙 410073)
  • 收稿日期:2009-07-10 修回日期:2009-12-04 出版日期:2011-11-25 发布日期:2011-11-25

Software FaultTolerance Techniques for Transient Faults

XU Jianjun,TAN Qingping,XIONG Yinqiao,TAN Lanfang,LI Jianli   

  1. (School of Computer Science,National University of Defense Technology,Changsha 410073,China)
  • Received:2009-07-10 Revised:2009-12-04 Online:2011-11-25 Published:2011-11-25

摘要:

宇宙射线辐射所导致的瞬态故障一直是航天计算面临的最主要挑战之一。而随着集成电路制造工艺的持续进步,现代处理器的性能在大幅度提高的同时,其可信性也正日益面临着瞬态故障的严重威胁。当前针对瞬态故障的容错技术可大致分为两类:基于硬件实现和基于软件实现。相比较前者,后者由于在实现成本和灵活性等方面的优势而备受关注。本文首先概述了面向瞬态故障的容错基本原理和对应软件容错技术的主要特点;然后,从不同实现层次介绍和分析了软件容错技术有代表性的最新研究成果;最后,对当前研究的特点和存在的问题进行了总结,并对软件容错技术未来的研究方向给出了意见。

关键词: 瞬态故障, 软错误, 软件容错, 冗余计算, 可信计算

Abstract:

Transient faults, which are caused by the radiation of cosmic rays, are always one of the top challenges for computing in space applications. With the continuous progress of integrated circuits, the performance of modern processors are improved significantly, but their dependability are increasingly affected by transient faults. Currently, the techniques for transient fault tolerance can mainly be classified into two types: hardwareimplemented and softwareimplemented. Comparing with the former techniques, the latter are attractive because of their advantages on costs and flexibility. This paper firstly sketches the basic principle of transient fault tolerance and the characteristics of softwareimplemented techniques. Then, the representative techniques of softwareimplemented fault tolerance are introduced and analyzed from different levels. Finally, the properties and defects of the current studies are summarized, and the advices are proposed for the future research trends of softwareimplemented fault tolerance.

Key words: transient fault;soft error;software fault tolerance;redundancy