• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2013, Vol. 35 ›› Issue (11): 34-41.

• 论文 • Previous Articles     Next Articles

Design and implementation of largepoint 1D FFT on GPU 

HE Tao1,2,ZHU Daiyin1   

  1. (1.College of Electronic and Information Engineering,
    Nanjing University of Aeronautics and Astronautics,Nanjing 210016;
    2.Institute of Radar and Electronic Equipment,Aviation Industry Corporation China,Wuxi 214063,China)
  • Received:2013-06-08 Revised:2013-09-02 Online:2013-11-25 Published:2013-11-25

Abstract:

Considering the GPU’s powerful computing performance and advanced parallel processor architecture, a kind of concurrent design method is studied, which maps the FFT parallel algorithm onto CUDA architecture. This method follows optimized design principles for GPU platforms, such as, reducing global memory access, global memory access coalescing, efficient usage of shared memory, and intensive computing. Then, a largePoint 1D FFT is implemented on NVIDIA Tesla C2075 GPU based on the architecture of NVIDIA  Fermi. Experimental results show that this method is superior to the CUFFT library when the number of points is not larger than 256K, and it runs two times faster than the CUFFT 4.0 library, which shows that the new method is feasible and effective.

Key words: CUDA 4.0;fast fourier transform;GPU;high performance computing