• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2014, Vol. 36 ›› Issue (08): 1435-1440.

• • 上一篇    下一篇

Xeon Phi平台上基于模板优化的3D-GVF场计算加速

齐金1,李宽2,杨灿群1,杜云飞2   

  1. (1.国防科学技术大学并行与分布处理重点实验室,湖南 长沙 410073;2.国防科学技术大学计算机学院,湖南 长沙 410073)
  • 收稿日期:2013-08-12 修回日期:2013-11-11 出版日期:2014-08-25 发布日期:2014-08-25
  • 基金资助:

    国家863计划资助项目(2012AA010903);国家自然科学基金资助项目(61170049,61303189)

Accelerating 3D GVF field computation on #br# Xeon Phi using stencil optimization         

QI Jin1,LI Kuan2,YANG Canqun1,DU Yunfei2   

  1. (1.National Laboratory of Parallel and Distributed Processing,National University of Defense Technology,Changsha 410073;(2.College of Computer Science,National University of Defense Technology,Changsha 410073,China)
  • Received:2013-08-12 Revised:2013-11-11 Online:2014-08-25 Published:2014-08-25

摘要:

3D梯度向量流场(3D GVF field)广泛应用于多种3D图像分析算法中,其计算需要多次迭代,计算量大,如何提高其计算速度具有重要的研究意义。面向Intel Xeon Phi众核集成架构,首次进行了3D GVF场计算的加速优化。首先,挖掘3D图像像素点间存在的天然并行性,发挥众核架构优势,尝试线程级并行(多核)和数据级并行(SIMD)。其次,3D GVF场的计算过程是一种典型的3D7点模板运算,结合Xeon Phi架构的L2 缓存规格,提出一种高效的数据分块策略,充分挖掘数据的时/空局部性,有效缓解模板计算引起的缓存缺失,提升了计算性能。实验结果表明,引入模板优化技术能显著提升3D GVF场的计算速度,在图像维度为5123时,所提方法在57核Xeon Phi平台上的性能相比在2.6GHz 8核16线程的Intel Xeon E52670 CPU上的性能,加速比可达2.77。

关键词: 3D梯度向量流场, Xeon Phi, 模板优化, 缓存分块

Abstract:

3D Gradient Vector Flow (GVF) field has wide applications in many image processing algorithms. The computation of GVF field typically needs several iterations and is rather time consuming. Therefore, it is important and meaningful to improve the computation speed of 3D GVF field. The data level parallelism and thread level parallelism are introduced to accelerate the GVF field computation procedure on Intel Xeon Phi many core integrated platform for the first time. Meanwhile, GVF field computation is a kind of stencil computation, whose computationmemory access ratio is low. A novel cache blocking strategy is proposed to fully utilize the L2 cache of Xeon Phi architecture,and to improve the computation speed of GVF field. The experimental results show that the proposed optimizations could effectively improve the speed of GVF filed computation. Especially, for a 5123 3D image, compared with the performance obtained by a 2.6G Hz 8 core 16threads Intel Xeon E52670 CPU, the speedup achieved on Xeon Phi is 2.77X.

Key words: 3D GVF field, Xeon Phi, stencil optimization, cache blocking