• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2026, Vol. 48 ›› Issue (4): 580-589.

• 高性能计算 • 上一篇    下一篇

HSI:一种面向多芯粒的高带宽低延时协议转换机制

王勇,杨乾明,付文文,王永文   

  1. (1.国防科技大学计算机学院,湖南 长沙 410073;2.先进微处理器芯片与系统重点实验室,湖南 长沙 410073)

  • 收稿日期:2024-10-18 修回日期:2025-01-18 出版日期:2026-04-25 发布日期:2026-04-29
  • 基金资助:
    国防科技大学自主科研基金(22-TDRCJH-02-006)

HSI:A high-bandwidth and low-latency protocol conversion mechanism for multiple chiplets

WANG Yong,YANG Qianming,FU Wenwen,WANG Yongwen   

  1. (1.College of Computer Science and Technology,National University of Defense Technology,Changsha 410073;
    2.Key Laboratory of Advanced Microprocessor Chips and Systems,Changsha 410073,China)
  • Received:2024-10-18 Revised:2025-01-18 Online:2026-04-25 Published:2026-04-29

摘要: 芯粒技术因其成本低、良率高、集成度高成为延续摩尔定律和提升芯片性能的新兴技术。目前芯粒间传输的研究聚焦于高速互连接口,而片上网络至芯粒接口的协议转换技术的研究处于空白阶段,使其成为芯粒间传输延时和带宽的瓶颈。提出一种高带宽低延时的协议转换机制HSI,其采用基于组合式轮询调度策略读取片上网络中多类型Flit以降低传输延时和增加带宽,使用多切片报文格式封装Flit以提升有效带宽利用率,使用多写单读式队列结构支持多Flit并行访存以降低解析延时。为了验证HSI的优越性,面向主流的CHI网络协议和UCIe芯粒接口协议对HSI机制进行实现验证,结果显示HSI传输带宽可达512 Gbit/s,可与32通道UCIe传输带宽、DDR5.0访存带宽相适配,并且单Flit传输延时仅为6.05 ns,突发时Flit流的平均传输延时为1.2~1.7 ns。


关键词: 多芯粒, 片上网络, 协议转换机制, CHI协议; UCIe协议

Abstract: Chiplet technology has emerged as a promising approach to extend Moore’s Law and enhance chip performance due to its low cost, high yield, and high integration density. Currently, research on inter-chiplet transmission primarily focuses on high-speed interconnect interfaces, while the study of protocol conversion technology from the network-on-chip (NoC) to chiplet interfaces remains underexplored, posing a bottleneck for transmission latency and bandwidth between chiplets. This paper proposes a high-bandwidth and low-latency protocol conversion mechanism, named HSI. HSI employs a combined polling scheduling strategy to read multiple types of Flits from the NoC, thereby reducing transmission latency and enhancing bandwidth. It utilizes a multi-slice packet format to encapsulate Flits, improving effective bandwidth utilization, and adopts a multi-write single-read queue structure to support parallel memory access for multiple Flits, reducing parsing latency. To validate the superiority of HSI, this paper implements and verifies the HSI mechanism with respect to the mainstream CHI network protocol and UCIe chiplet interface protocol. The results demonstrate that HSI achieves a transmission bandwidth of up to 512  Gbit/s, which is compatible with the transmission bandwidth of 32-lane UCIe and the memory access bandwidth of DDR5.0. Moreover, the transmission latency for a single Flit is merely 6.05 ns, while the average transmission latency for burst Flit streams ranges from 1.2~1.7 ns.

Key words: multiple chiplets, network-on-chip(NoC), protocol transition mechanism, coherent hub interface(CHI) protocol, universal chiplet interconnect express(UCIe) protocol