• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2011, Vol. 33 ›› Issue (3): 41-45.doi: 10.3969/j.issn.1007130X.2011.

• 论文 • Previous Articles     Next Articles

Implementation and Optimization of  Stencil Applications on GPUs

FANG Xudong,TANG Yuhua,WANG Guibin,TANG Tao   

  1. (School of Computer Science,National University of Defense Technology,Changsha 410073,China)
  • Received:2009-07-26 Revised:2009-10-21 Online:2011-03-25 Published:2011-03-25

Abstract:

With the fast development of GPUs, using them to accelerate scientific computing applications is becoming an inevitable trend. In this paper, we port two typical subroutines Rprj3 and Interp from Mgrid which contains rich stencil operations in SPEC2000 to run on an AMD GPU using Brook+. Using a thread granularity tuning mechanism provided by Brook+, we implement different ported program versions and analyze their performances. We also conclude how to utilize thread granularity tuning to optimize stencil program transplantation. Our experimental results show that under the largest problem size, Rprj3 obtains a speedup of 5.37 over its CPU version while Interp gains a speedup of 12.8 over its CPU version.

Key words: GPU;optimization;stencil