• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2021, Vol. 43 ›› Issue (05): 799-806.

Previous Articles     Next Articles

Optimization of Gaussian filtering algorithm on FT-M7002

CHEN Yun1,2,WANG Meng-yuan1,2,CHAI Xiao-nan1,2,SHANG Jian-dong1,2   

  1. (1.School of Information Engineering,Zhengzhou University,Zhengzhou 450001;

    2.Supercomputing Center of Henan Province (Zhengzhou University),Zhengzhou 450052,China)


  • Received:2020-12-17 Revised:2021-03-04 Accepted:2021-05-25 Online:2021-05-25 Published:2021-05-19

Abstract: With the application of domestically developed Feiteng series high-performance DSP processors in the field of image processing, there is a strong demand for high-performance image processing algorithms on this platform. As the basic algorithm of image processing, Gaussian filtering can effectively filter out Gaussian noise in images, and it has been widely used in the field of image processing. According to the architectural characteristics of FeiTeng high-performance DSP and the characteristics of Gaussian filtering algorithm, the optimization of Gaussian filtering algorithm on Feiteng high performance DSP is realized. Optimization methods such as manual vectorization, control flow elimination, and loop unrolling are adopted to take full advantage of data-level and instruction-level parallelism, thereby reducing the number of data accesses and improving instruction efficiency. According to the DMA hardware and vector memory structure characteristics in the FT-MT2 core, optimizations such as ping-pong cache and DMA array transposition are performed to reduce the data transmission time and improve the data locality. Test results under various filter kernel sizes and image matrix scales show that, compared to the serial implementation of the Gaussian filter algorithm, the parallel optimization implementation achieves a speedup of 1.3~1.41. With cache enabled, compared with the running performance of the Gaussian filtering algorithm in the dsplib library on the TMS320C6678 platform, the acceleration effect is 1.15~1.71 times.




Key words: high performance DSP, Gaussian filtering, vector parallel optimization, DMA transmission optimization