• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2013, Vol. 35 ›› Issue (11): 175-181.

• 论文 • 上一篇    下一篇

基于硬件事件的并行程序指令级性能模型与应用

罗红兵,武林平   

  1. (北京应用物理与计算数学研究所高性能计算中心,北京 100094)
  • 收稿日期:2013-08-10 修回日期:2013-10-18 出版日期:2013-11-25 发布日期:2013-11-25
  • 基金资助:

    国家863计划资助项目(2012AA01A309)

An instruction level performance model of
parallel program based on hardware events           

LUO Hong-bing,WU Lin-ping   

  1. (High Performance Computing Center,Institute of Applied Physics and Computational Mathematics,Beijing 100094,China)
  • Received:2013-08-10 Revised:2013-10-18 Online:2013-11-25 Published:2013-11-25

摘要:

当前,应用程序持续运行性能与高性能计算机峰值性能的差距有扩大的趋势,许多实际应用程序的性能通常只能达到机器峰值性能的5%~10%,甚至更低,如何优化并行应用成为高性能计算领域关注的焦点。从如何利用硬件事件进行程序指令级优化入手,提出一种基于硬件事件的性能模型,揭示出程序性能与程序特征、微处理器特征的关系。基于该性能模型,在Intel Xeon微处理器平台上对Euler等程序进行优化,gas1dapproxy等性能热点模块的执行时间可以缩短12%~61%。性能优化实验表明:使用该性能模型可以降低用户进行指令级并行性能优化的难度,指导用户选择正确的性能优化方向。

关键词: 性能分析, 性能优化, 性能模型, 指令级并行

Abstract:

The gap between peak performance of supercomputer and sustained performance of applications is becoming bigger and bigger, and many actual applications only reach 5%~10% of peak performance for supercomputer, or even less, therefore, performance problem is being gotten more and more concerns during parallel program development. A performance model based on hardware monitor events is proposed, which reveals relationship between performance and feature of program and processor. Based on the performance model, Euler and other programs are optimized on the Intel Xeon platform, and the execution time of hotspot modules such as gas1dapproxy is shortened by 12%~61%. The experiment results show that this model is helpful to optimize ILP performance of the scientific computing applications.

Key words: performance analysis;performance optimization;performance model;instruction level parallelism