• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (11): 1901-1908.

• 高性能计算 • 上一篇    下一篇

面向众核系统的层次化栅栏同步机制

臧照虎,李晨,王耀华,陈小文,郭阳   

  1. (国防科技大学计算机学院,湖南 长沙 410073)


  • 收稿日期:2021-11-22 修回日期:2022-03-22 接受日期:2022-11-25 出版日期:2022-11-25 发布日期:2022-11-25
  • 基金资助:
    国防科技大学科研计划(ZK20-04)

A hierarchical hardware barrier synchronization design for many-core processors

ZANG Zhao-hu,LI Chen,WANG Yao-hua,CHEN Xiao-wen,GUO Yang    

  1. (College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)
  • Received:2021-11-22 Revised:2022-03-22 Accepted:2022-11-25 Online:2022-11-25 Published:2022-11-25

摘要: 同步操作在保证多核处理器线程的数据一致性和正确性等方面起着重要作用。随着处理器内核数量的不断增加,同步操作的开销也越来越大。栅栏同步是并行应用中多核同步的重要方法之一。软件同步方法通常需要数千个周期才能完成多个内核之间的同步,这种高延迟和串行化同步会导致多核程序性能的显著下降。相比于软件栅栏同步方法,硬件栅栏能够实现较低的同步延迟,然而传统集中式硬件栅栏的可扩展性有限,难以适应众核处理器系统的同步需求。面向众核处理器提出了一种层次化硬件栅栏机制——HSync,它由本地栅栏单元和全局栅栏单元组成,二者协调配合,以实现低硬件开销的快速同步。实验结果表明,与传统的集中式硬件栅栏相比,层次化硬件栅栏机制将众核处理器系统性能提高了1.13倍,同时网络流量减少了74%。

关键词: 硬件同步, 栅栏, 众核系统, 并行计算

Abstract: Synchronization plays an important role in ensuring data consistency and correctness of multicore processor threads. As the number of processor cores increases, the cost of synchronization increases. Barrier synchro-nization is one of the effective methods for multi-core synchronization in parallel applications. Software synchronization methods typically require thousands of cycles to complete synchronization among multiple cores. This high latency and serialization synchronization can result in significant performance degradation of multicore programs. Compared with the software barrier synchronization method, the hardware barrier can achieve lower synchronization delay, but the scalability of the centralized hardware barrier is limited and it is difficult to adapt to the multicore processor systems. This paper proposes a hierarchical hardware barrier mechanism called HSync for multicore processors. It consists of local and global barrier units, which work together to achieve fast synchronization with low hardware overhead. The experimental results show that the hierarchical hardware barrier mechanism improves the performance of the multicore proces-sor system by 1.13 times and reduces network traffic by 74% compared with the traditional centralized hardware barrier.

Key words: hardware synchronization, barrier, many-core processors, parallel computing