• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (09): 1538-1545.

• 高性能计算 • 上一篇    下一篇

面向高带宽I/O的片上网络优化

石伟,龚锐,刘威,王蕾,冯权友,张剑锋   

  1. (国防科技大学计算机学院,湖南 长沙 410073) 
  • 收稿日期:2020-08-08 修回日期:2021-04-12 接受日期:2021-09-25 出版日期:2021-09-25 发布日期:2021-09-24
  • 基金资助:
    核高基国家科技重大专项(2017ZX01028-103-002);科技部重点研发计划(2020AAA0104602,2018YFB2202603);国家自然科学基金(61832018)

Network-on-Chip optimization for high bandwidth I/O in processors

SHI Wei,GONG Rui,LIU Wei,WANG Lei,FENG Quan-you,ZHANG Jian-feng#br#

#br#
  

  1. (College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China) 
  • Received:2020-08-08 Revised:2021-04-12 Accepted:2021-09-25 Online:2021-09-25 Published:2021-09-24

摘要: 在高性能处理器中,I/O带宽需求不断增加,一方面高速接口的通道数目不断增加,另一方面接口传输速率也在逐渐提升。高性能处理器的片上网络必须能够匹配各种高速I/O的带宽需求,且必须保证DMA请求能够正确完成。然而各种高速接口协议与片上网络协议在通信机制上存在较大的差别,可能导致死锁等现象的产生。首先对匹配高性能I/O的片上网络存在的问题进行分析,然后提出一种高带宽I/O设计方法及死锁解决方法。采用解死锁方法的片上网络增强了I/O系统的鲁棒性,同时可以减少片上网络设计及运行时的各种限制,提升I/O性能。最后,将所提出的优化方法应用到高性能服务器处理器芯片中,并进行评测,针对16通道PCIe 4.0接口,双向读写带宽分别达到30 GB/s,在一些特殊场景出现死锁以后,片上网络能自动检测死锁并解除死锁。


关键词: 片上网络, 协议转换, 高带宽, 死锁检测, 死锁解除

Abstract: In high-performance processors, the demand of I/O bandwidth is increasing. On the one hand, more and more lanes of high-speed interface are used, and on the other hand the transmission rate of interface is also raised gradually. The Network-on-Chip (NoC) of high-performance processors must be able to match the bandwidth requirements of various high-speed I/O interface, and must ensure that direct memory access (DMA) requests can be completed correctly. However, there are great differences in communication mechanism between various high-speed interface protocols and interconnection network protocols, which may lead to deadlock and other problems. This paper first analyzes NoC and high performance I/O, and proposes a method of designing high bandwidth I/O interface and a solution of resolving deadlock. NoC with deadlock resolution technique makes the I/O system more robust, and various limitations of NoC design can be reduced. Finally, based on a server processor, the proposed optimization method was implemented and evaluated. For 16-lane PCIe Gen4 interface, the read and write bandwidths reach up to 30GB/s respectively. In some special scenarios, deadlock is produced due to special transaction sequences, and the NoC can automatically detect the deadlock and release the deadlock.

Key words: network-on-chip, protocol conversion, high bandwidth, deadlock detection, deadlock resolution