• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (11): 1909-1917.

• 高性能计算 • 上一篇    下一篇

面向FT-M7002平台点积算法的优化实现

郭盼盼1,2,陈梦雪3,梁祖达1,2,马晓畅3,许邦建4   

  1. (1.郑州大学计算机与人工智能学院,河南 郑州 450066;2.国家超级计算郑州中心(郑州大学),河南 郑州 450001;
    3.湖南大学电气与信息工程学院,湖南 长沙 410082;4.湖南大学信息科学与工程学院,湖南 长沙 410082)
  • 收稿日期:2022-02-07 修回日期:2022-04-01 接受日期:2022-11-25 出版日期:2022-11-25 发布日期:2022-11-25

Optimization of dot product algorithms on FT-M7002

GUO Pan-pan1,2,CHEN Meng-xue3,LIANG Zu-da1,2,MA Xiao-chang3,XU Bang-jian4   

  1. (1.School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450066;
    2.National Supercomputing Center in Zhengzhou(Zhengzhou University),Zhengzhou 450001;
    3.School of Electrical and Information Engineering,Hunan University,Changsha 410082;
    4.School of Information Science and Engineering,Hunan University,Changsha  410082,China)
  • Received:2022-02-07 Revised:2022-04-01 Accepted:2022-11-25 Online:2022-11-25 Published:2022-11-25

摘要: 基于国产的FT-M7002平台高性能DSP,针对不同类型的点积算法进行了优化实现,完善了该处理器平台数学库的技术链,充分发挥了FT-M7002内核体系结构优势,对点积算法实现了SIMD向量并行化、DMA双通道传输和SVR传输等优化。该研究充分挖掘了程序的向量并行性,有效地提升了数据传输的速度,提高了程序性能。实验结果表明,输入不同规模大小的数组,不同类型的点积算法在FT-M7002平台上优化后和优化前的平均性能比为12.416 6~45.233 8。相较于TI官网的dsplib库中不同类型的点积函数在TMS320C6678处理器上运行的性能,FT-M7002平台优化后的性能与TI平台的平均性能比为1.371 6~4.519 6。实验结果表明了该DSP平台相对于TI主流平台的计算性能优势。

关键词: FT-M7002, DSP, 点积算法, 向量, DMA双通道传输, SVR传输

Abstract: On the high-performance DSP of domestic FT-M7002 platform, different types of dot product algorithms are optimized and implemented. The technical chain of the mathematical library of the processor platform is consummated. Taking full advantage of FT-M7002 kernel architecture, SIMD vector parallelization, DMA dual channel transmission, SVR transmission and other optimization methods for dot product algorithm are realized. The research fully excavates the vector parallelism of the program, effectively improving the speed of data transmission and improving the performance of the program. The experimental results show that the average performance ratio of different types of dot product algorithms after and before optimization on FT platform is 12.416 6~45.233 8. Compared with the performance of different types of dot product functions in dsplib library on TI official website on TMS320C6678 processor, the average performance ratio between FT platform and TI platform is 1.371 6 ~ 4.519 6. The research results show that the DSP platform has obvious computational performance advantages over TI mainstream platform. 

Key words: FT-M7002, digital signal processor(DSP), dot product algorithm, vector, DMA dual channel transmission, SVR transmission