• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 • 上一篇    下一篇

面向ARM64架构多核微处理器的模板计算性能优化研究

冯璐霞,李春江,黄亚斌   

  1. (国防科学技术大学计算机学院,湖南 长沙 410073)
  • 收稿日期:2017-01-07 修回日期:2017-03-20 出版日期:2017-05-25 发布日期:2017-05-25
  • 基金资助:

    国家自然科学基金(61170046);国家863计划(2012AA010903)

Performance optimization  of stencil computation
on ARM64 multi-core microprocessor

FENG Lu-xia,LI Chun-jiang,HUANG Ya-bin   

  1. (College of Computer,National University of Defense Technology,Changsha 410073,China)
  • Received:2017-01-07 Revised:2017-03-20 Online:2017-05-25 Published:2017-05-25

摘要:

模板计算是一类重要的计算核心,广泛存在于图像和视频处理以及大规模科学和工程计算领域。但是,针对ARM64高性能处理器的模板计算性能的优化研究还很少。为了实现典型模板计算核心在ARM64架构多核微处理器上的并行化和性能优化,基于AMCC XGENE2和飞腾FT1500A多核微处理器特点,提出了基于两维度绑定的优化方法,该方法通过线程与CPU绑定以及线程与数据块绑定,减少了线程调度的并行开销,增加了Cache的命中率。实验结果表明,该方法提升了模板计算在ARM64架构多核微处理器上的性能,且在两种ARM64架构多核微处理器平台上都表现出较好的可扩展性。

Abstract:

Stencil computation is a class of important calculation kernels widely used in the field ranging from image and video processing to largescale scientific and engineering simulation and calculation. However, the evaluation of stencil computation on the ARM64 highperformance processor is rare. According to the features of AMCC XGENE2 and Phytium  FT1500A, we design an optimization method based on twodimension bound, which reduces the parallelism overheads of thread scheduling,and increases the Cache hit rate by the threadCPU bound and threaddatablock bound. Experimental results show that this method can improve the performance of the stencil calculation on ARM64 architecture, and the results of our kernel demonstrate the good scalability on the two ARM64 multicore microprocessor platforms.
 

Key words: stencil computation;ARM64;AMCC XGENE2;FT1500A;parallelism, thread bound