• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (11): 1901-1911.

• 高性能计算 • 上一篇    下一篇

面向特定应用的可配置CPU性能分析方法

邓全,林荣臻,罗莉,鲁建壮,王永文
  

  1. (1.国防科技大学计算机学院,湖南 长沙 410073;2.先进微处理器芯片与系统重点实验室,湖南 长沙 410073)

  • 收稿日期:2024-11-06 修回日期:2025-01-06 出版日期:2025-11-25 发布日期:2025-12-04
  • 基金资助:
    国家自然科学基金(62202481);国防科技大学科研计划项目(ZK22-05); 高层次科技创新人才工程人选自主科研项目(22-TDRCJH-02-006);PDL开放基金(WDZC20235250112);芙蓉计划科技创新类湖湘青年英才项目(2024RC3116)

Configurable CPU performance analysis method for specific applications

DENG Quan,LIN Rongzhen,LUO Li,LU Jianzhuang,WANG Yongwen   

  1. (1.College of Computer Science and Technology,National University of Defense Technology,Changsha 410073;
    2.Key Laboratory of Advanced Microprocessor Chips and Systems,Changsha 410073,China)
  • Received:2024-11-06 Revised:2025-01-06 Online:2025-11-25 Published:2025-12-04

摘要: 随着集成电路的发展和芯片应用的不断拓展,可配置CPU为芯片设计空间的探索提供了便利。可配置CPU不仅能满足敏捷设计的需求,还能兼顾用户根据目标应用进行调优的需求。然而,目前面向特定应用的可配置CPU的性能调优仍主要依赖于资深体系结构工程师,缺乏一套科学方法进行指导,因此,提出了一种面向特定应用的可配置CPU性能分析方法。在软件层面,利用Perf工具快速定位应用程序在硬件执行时的热点代码块;在硬件层面,通过分析框架的2种计数模式(时钟周期计数与slots计数),锁定各个执行部件的热点执行情况,以便设计人员快速定位硬件执行的热点行为。对支持RISC-V指令集的可配置DMR架构在流体力学典型程序NPB上进行了敏捷设计。实验结果表明,迭代后可配置CPU单核性能提升了13.2%,面积开销增加了12.2%。


关键词: 性能分析, 可配置CPU, PMU, NPB, 测试

Abstract: With the development of integrated circuits and the continuous expansion of chip applications, configurable CPU facilitates the exploration of the chip design space. Configurable CPU can not only meet the demands of agile design but also cater to users’ needs for tuning based on target applications. However, at present, the performance tuning of application-specific configurable CPU still primarily relies on experienced architecture engineers, lacking a set of scientific methodologies for guidance. Therefore, this paper proposes a configurable CPU performance analysis method for specific applications. At the software level, the Perf tool is utilized to quickly identify hot code blocks in applications during hardware execution. At the hardware level, by analyzing two counting modes (counting of cycles and counting of slots) within the analysis framework, the hot execution conditions of each execution unit are pinpointed, enabling designers to swiftly locate hot behaviors in hardware execution. This paper conducts agile design for a configurable DMR (dual-module redundancy) architecture supporting the RISC-V instruction set using the typical fluid dynamics program NPB (NAS parallel benchmark). The experimental results indicate a 13.2% improvement in single-core performance of the configurable CPU, with a 12.2% increase in area overhead.


Key words: performance analysis, configurable CPU, performance monitoring unit(PMU), NAS parallel benchmark(NPB), test