• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2020, Vol. 42 ›› Issue (10高性能专刊): 1791-1800.

• 高性能计算机系统软件 • 上一篇    下一篇

以编译为导向的Matrix-DSP程序分析与优化

荀长庆,陈照云,文梅,孙海燕,马奕民   

  1. (国防科技大学计算机学院,湖南 长沙 410073)

  • 收稿日期:2020-06-10 修回日期:2020-07-20 接受日期:2020-10-25 出版日期:2020-10-25 发布日期:2020-10-23
  • 基金资助:
    国家重点研发计划(2018YFB0204301)

Compilation-oriented code analysis and optimization for Matrix DSP

XUN Chang-qing,CHEN Zhao-yun,WEN Mei,SUN Hai-yan,MA Yi-min   

  1. (School of Computer,National University of Defense Technology,Changsha 410073,China)
  • Received:2020-06-10 Revised:2020-07-20 Accepted:2020-10-25 Online:2020-10-25 Published:2020-10-23

摘要: 数字信号处理器(DSP)在图像处理、自动化控制、信号处理等多个领域具有广泛应用。自主研发的Matrix DSP采用了典型的单指令多数据SIMD+超长指令字VLIW的向量化架构,因此面向该架构如何实现高效的向量化编程与优化是一项重要挑战。基于Matrix DSP的体系结构特点,以编译器性能为导向,对内核级代码常用的分析优化手段进行梳理和总结,并结合一个通用矩阵乘的例子进行展示,其执行性能可最高提升1个数量级。最后,从编译器优化和程序员高效编程的角度提出了一些后续的思考与讨论。


关键词: Matrix DSP, 向量化编程, 程序优化, 编译器

Abstract: Digital Signal Processor (DSP) are widely used in numerous fields such as image proces- sing, automation control, and signal processing. Matrix DSPs, which are independently developed by ourselves, adopt a typical vectorization architecture of Single Instruction Multiple Data (SIMD) + Very Long Instruction Word (VLIW). Therefore, it is a prominent challenge to implement efficient vecto- rized programming and optimization for such architecture. According to the characteristics of Matrix DSP and the compilation performance, the analysis and optimization methods commonly used in the kernels are summarized. Furthermore, an example of general matrix multiplication (GEMM) is used to show that the execution performance can be improved by up to 1 order of magnitude. Based on the summary of optimization methods, some follow-up thoughts and discussions are proposed from the perspective of compiler optimization and programmers’ efficient programming.


Key words: Matrix DSP, vectorization programming, program optimization, compiler