J4 ›› 2014, Vol. 36 ›› Issue (02): 216-221.
• 论文 • Previous Articles Next Articles
XIA Jun,PANG Zhengbin,LIU Lu,ZHANG Jun,CHANG Junsheng
Received:
Revised:
Online:
Published:
Abstract:
With the continually growing size and complexity of high performance computing systems, reliability has become the crucial factor of affecting the availability of high performance computing systems. System network is the important component of high performance computing systems and its reliability must be considered in high performance computing system design. Aiming at failures possibly occurring in high performance computing system network, the paper proposes a NIC based RDMA reliable communication protocol, gives a general framework of realizing this protocol and discusses some optimized implementation methods based on the framework. The reliable communication protocol and its implementation can tolerate system network failures and can reduce the overhead of realizing reliable communications. The experimental results show that the performance of the RDMA reliable communication is comparable with that of the noconnection RDMA communication.
Key words: RDMA;reliability;network interface;reliable communication protocol
XIA Jun,PANG Zhengbin,LIU Lu,ZHANG Jun,CHANG Junsheng. Design and implementation of a NIC based RDMA reliable communication protocol [J]. J4, 2014, 36(02): 216-221.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://joces.nudt.edu.cn/EN/
http://joces.nudt.edu.cn/EN/Y2014/V36/I02/216