• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊
论文

基于分布式架构的星载并行计算机容错技术

展开
  • (国防科学技术大学计算机学院,湖南 长沙 410073)
王伟成(1981),男,陕西西安人,硕士生,研究方向为操作系统、容错技术、计算机网络。罗宇(1963),男,湖南衡阳人,博士,教授,研究方向为操作系统、计算机网络。

收稿日期: 2009-07-03

  修回日期: 2009-11-05

  网络出版日期: 2011-03-25

FaultTolerance Techniques for OnBoard Parallel  Computer System Based on Distributed Architecture

Expand
  • (School of Computer Science,National University of Defense Technology,Changsha 410073,China)

Received date: 2009-07-03

  Revised date: 2009-11-05

  Online published: 2011-03-25

摘要

星载计算机需要容错技术来满足在外太空运行的可靠性要求。目前的星载计算机多机系统通常设计为

主从结构,集中于一个主节点上进行容错策略控制,这种结构存在着一点失效即瘫痪的隐患。为此,本文提

出一种分布式架构下的星载并行容错计算机系统,将集中控制的容错部件分布化于各个节点之上,提高了系

统的容错可靠性,在此架构上提出了计算节点、容错部件和I/O等容错策略,并给出了相应的模型及模拟测

试结果,为进行类似项目的开发研究提供了有价值的指导和参考。

本文引用格式

王伟成,罗宇 . 基于分布式架构的星载并行计算机容错技术[J]. 计算机工程与科学, 2011 , 33(3) : 51 -56 . DOI: 10.3969/j.issn.1007130X.2011.

Abstract

Faulttolerant techniques can provide high reliability for onboard computers

running in the outer space, the current multinode onboard systems are designed as a master

slave structure, which focuses on the strategy of faulttolerance  in the master node and

hereby contains a hidden danger. A parallel faulttolerant computer system with a distributed

framework is proposed in this paper. Based on the framework, the computing nodes and fault

tolerant units are designed and some novel faulttolerant strategies are introduced. Our work

can serve as an important guideline for the development of the related projects.

参考文献

[1]Ramos J,Samson J,Lupia D,et al.Highperformance, Dependable Multiprocessor[C]∥Proc

of the 2006 IEEE Aerospace Conf, 2006.
[2]左朝树.基于寄生式故障检测的分布式并行服务器系统容错技术:[博士学位论文][D].成都:电子

科技大学, 2005.
[3]张国强.星载并行处理计算机系统容错技术研究:[硕士学位论文][D].长沙:国防科学技术大

学,2006.
[4]Wensley J H.SIFT Software Implemented Fault Tolerance[C]∥Proc of the Fall Joint

Computer Conf,1972:243253.
[5]Vxworks程序开发实践[EB/OL].[20090213].http://www.netyi.net/Book/143dd3d66ec2

461f9017c9df0e1818c8.htm.
[6]Ayav T,Fradet P,Girault A. Implementing FaultTolerance in RealTime Systems by

Automatic Program Transformations[C]∥Proc of the 6th ACM & IEEE Int’l Conf on Embedded

Software,2006:205214.
[7]Bronevetsky G, Marques D, Pingali K, et al. Automated ApplicationLevel Checkpointing of

MPI Programs[C]∥Proc of the ACM SIGPLAN Symp on Principles and Practice of Parallel

Programming (PPoPP 2003) and Workshop on Partial Evaluation and SemanticsBased Program

Manipulation,2003:8494.
[8]史殿习,吴泉源,王怀民,等.嵌套式动态容错协议的研究与设计[J]. 软件学报, 2002,13(2):235

238.
[9]魏昕.COTS技术在企业中的应用[J].计算机系统应用,2000(11):69.
[10]陈宇.高可靠容错实时系统的支撑技术研究:[博士学位论文][D].成都:电子科技大学, 2002.

文章导航

/