• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2015, Vol. 37 ›› Issue (12): 2233-2241.

• 论文 • 上一篇    下一篇

面向分布式流体系结构的多副本积极容错技术

李鑫1,3,4,林宇斐2,郭晓威1   

  1. (1.国防科学技术大学高性能计算国家重点实验室,湖南 长沙 410073;2.国防科学技术大学研究生院,湖南 长沙 410073;
    3.解放军理工大学,江苏 南京 210007;4.总参第六十三研究所,江苏 南京 210007)
  • 收稿日期:2015-09-08 修回日期:2015-11-26 出版日期:2015-12-25 发布日期:2015-12-25
  • 基金资助:

    国家自然科学基金资助项目(61221491,61303071)

A triple modular eager redundancy faulttolerant
technique for distributed stream architecture 

LI Xin1,3,4,LIN Yufei2,GUO Xiaowei1   

  1. (1.The State Key Laboratory of High Performance Computing,National University of Defense Technology,Changsha 410073;
    2.Graduate School,National University of Defense Technology,Changsha 410073;
    3.PLA University of Science and Technology,Nanjing 210007;
    4.The 63rd Research Institute of PLA General Staff Headquarters,Nanjing 210007,China)
  • Received:2015-09-08 Revised:2015-11-26 Online:2015-12-25 Published:2015-12-25

摘要:

随着互联网环境下计算系统规模的不断扩大,分布式流体系结构的可靠性问题面临着严峻的挑战。以多模冗余容错技术为基础,针对软错误提出了一种面向分布式流体系结构的多副本积极容错技术TREFT,利用三个程序副本进行高效的检错与纠错。在分布式流体系结构原型系统上的实验结果表明,该技术能有效提高系统的可靠性,具有较低的容错成本,平均增加1077%的容错开销。

关键词: 分布式流体系结构, 容错技术, 三模冗余

Abstract:

As computing systems continue to expand in size in the Internet environment, the reliability of the distributed stream architecture is facing serious challenges. Based on the Nmodular redundancy technique, we propose a triple modular eager redundancy faulttolerant method for the distributed stream architecture (TREFT). The TREFT employs three program copies to run the error detection and error correction processes efficiently. Experimental results on a prototype system of the distributed stream architecture show that the TREFT could enhance the reliability of the system at very low cost, increasing the faulttolerant cost by 10.77% on average.

Key words: distributed stream architecture;faulttolerant technique;triple modular redundancy