• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (02): 191-199.

• 高性能计算 • 上一篇    下一篇

GPU上基于环展开的RTL模拟加速技术研究

田茜,李暾,程悦,皮彦,邹鸿基


  

  1. (国防科技大学计算机学院,湖南 长沙 410073)
  • 收稿日期:2023-08-18 修回日期:2023-12-07 接受日期:2025-02-25 出版日期:2025-02-25 发布日期:2025-02-21
  • 基金资助:
    国家自然科学基金(U19A2062)

GPU-accelerated RTL simulation with Loop unrolling

TIAN Xi,LI Tun,CHENG Yue,PI Yan,ZOU Hongji   

  1. (College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)
  • Received:2023-08-18 Revised:2023-12-07 Accepted:2025-02-25 Online:2025-02-25 Published:2025-02-21

摘要: 随着开源和敏捷硬件设计方法学的发展,为其提供高效的RTL模拟支持变得日益重要。GPU的并行能力使得利用RTL模拟的结构级和激励级并行性加速RTL模拟成为可能。然而,由于时序设计中存在反馈环,如何实现单个Testbench内的数据级并行仍然是一个很大的挑战。提出了一种新的利用GPU加速RTL模拟的方法,该方法的核心技术是RTL设计中反馈环的识别与展开,以及基于此的RTL电路划分技术。电路划分和环展开从单个Testbench内的结构并行和数据并行2个方面,发挥了基于GPU的并行能力来加速RTL模拟。实验结果表明,所提出的GPU加速RTL模拟方法,相比传统的基于GPU的RTL模拟方法得到了1.2~107.1倍的加速,相比目前最快的RTL模拟器ESSENT得到了2.2~14倍的加速。

关键词: RTL模拟, GPU加速, PyRTL, 硬件构造语言, 环展开

Abstract: With the development of open-source and agile hardware design methodologies, providing efficient RTL (register-transfer level) simulation support has become increasingly important. The parallel capabilities of GPUs enable the acceleration of RTL simulations by leveraging structural-level and stimulus-level parallelism within RTL simulations. However, due to the presence of feedback loops in timing designs, achieving data-level parallelism within a single testbench remains a significant challenge. This paper proposes a novel method for accelerating RTL simulations using GPUs. The core technologies of this method involve the identification and unfolding of feedback loops in RTL designs, as well as RTL circuit partitioning techniques based on this approach. Circuit partitioning and loop unfolding harness the parallel capabilities of GPUs to accelerate RTL simulations through both structural parallelism and data parallelism within a single testbench. Experimental results demonstrate that the proposed GPU-accelerated RTL simulation method exhibits a speedup ranging from 1.2 to 107.1 times compared to traditional GPU-based RTL simulation methods, and a speedup of 2.2 to 14 times compared to the fastest RTL simulator currently available, ESSENT. 

Key words: RTL simulation, GPU-accelerated, Python register transfer level(PyRTL), hardware construction language(HCL), loop-unrolling ,