• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2014, Vol. 36 ›› Issue (02): 216-221.

• 论文 • Previous Articles     Next Articles

Design and implementation of a NIC based RDMA reliable communication protocol                 

XIA Jun,PANG Zhengbin,LIU Lu,ZHANG Jun,CHANG Junsheng   

  1. (College of Computer,National University of Defense Technology,Changsha 410073,China)
  • Received:2013-07-10 Revised:2013-10-06 Online:2014-02-25 Published:2014-02-25

Abstract:

With the continually growing size and complexity of high performance computing systems, reliability has become the crucial factor of affecting the availability of high performance computing systems. System network is the important component of high performance computing systems and its reliability must be considered in high performance computing system design. Aiming at failures possibly occurring in high performance computing system network, the paper proposes a NIC based RDMA reliable communication protocol, gives a general framework of realizing this protocol and discusses some optimized implementation methods based on the framework. The reliable communication protocol and its implementation can tolerate system network failures and can reduce the overhead of realizing reliable communications. The experimental results show that the performance of the RDMA reliable communication is comparable with that of the noconnection RDMA communication.

Key words: RDMA;reliability;network interface;reliable communication protocol