超长指令字DSP标量访存单元的设计与优化

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (11): 1929-1940.

超长指令字DSP标量访存单元的设计与优化

郑康,李晨,陈海燕,刘胜,方粮

(国防科技大学计算机学院，湖南长沙 410073)

收稿日期:2022-10-19 修回日期:2023-02-23 出版日期:2023-11-25 发布日期:2023-11-16
基金资助:
国家自然科学基金(62202478);国防科技大学科研项目(ZK20-04)

Design and optimization of scalar memory access unit in VLIW DSPs

ZHENG Kang,LI Chen,CHEN Hai-yan,LIU Sheng,FANG Liang

(College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)

Received:2022-10-19 Revised:2023-02-23 Online:2023-11-25 Published:2023-11-16

摘要/Abstract

摘要： 近年来，随着集成电路技术的发展处理器与存储器之间的速度差异越来越大，存储器愈发成为制约计算系统性能的瓶颈。对于嵌入式、低功耗领域的DSP而言，其架构和应用场景与通用CPU不同，CPU的访存设计难以满足DSP的访存需求。针对超长指令字DSP在访存实时性、顺序与固定延迟、高效数据一致性方面的需求，设计了一种适用于DSP的标量访存单元，可配置的设计能够满足DSP的访存实时性；基于ID的顺序机制保证超长指令字架构对Load指令返回数据的顺序与固定延迟要求，存储开销为87.5 B；硬件查找“首1”加速了数据一致性所需的写回操作。当Cache中25%，50%和75%的行需要写回时，优化后的一致性写回开销为逐行扫描方法的26.4%，51.3%和76.2%，只与有效脏行数量成正比，与Cache容量无关。

关键词: 标量访存单元, DSP, 超长指令字

Abstract: In recent years, the speed difference between processors and memories has become increasingly larger with the development of integrated circuit technology, and memories have increasingly become the bottleneck that limits the performance of computing systems. For DSPs in embedded and low-power consumption areas, their architectures and application scenarios are different from general-purpose CPUs, and the memory access design of CPUs cannot meet the memory access requirements of DSPs. To address the requirements of Very Long Instruction Word (VLIW) DSPs in terms of real-time memory access, order and fixed delay, and efficient data consistency, a scalar memory access unit suitable for DSPs is designed. The configurable design can meet the real-time memory access requirements of DSPs. The ID-based ordering mechanism ensures the order and fixed delay requirements of VLIW with a storage overhead of 87.5 B. The write back operation, designed for data consistency, is accele- rated by searching leading-one in hardware. The time consumed by the optimized write back operation are 26.4%, 51.3% and 76.2%, compared to the basic overhead of the progressive scan method, when 25%, 50% and 75% lines of the cache need to be written back. The consistency write back performance is proportional to the number of lines under concern, regardless of the cache capacity.

Key words: scalar memory access unit, digital signal processor (DSP), very long instruction word (VLIW)

郑康, 李晨, 陈海燕, 刘胜, 方粮. 超长指令字DSP标量访存单元的设计与优化[J]. 计算机工程与科学, 2023, 45(11): 1929-1940.

ZHENG Kang, LI Chen, CHEN Hai-yan, LIU Sheng, FANG Liang. Design and optimization of scalar memory access unit in VLIW DSPs[J]. Computer Engineering & Science, 2023, 45(11): 1929-1940.

[1]	田玉恒,马胜,鲁建壮,杨柳. 一种高效的DMA核间同步传输方法[J]. J4, 20160101, 38(01): 52-56.
[2]	安昕辰. DSP处理器二级缓存的结构优化研究[J]. 计算机工程与科学, 2025, 47(01): 10-17.
[3]	时洋, 陈照云, 孙海燕, 王耀华, 文梅, 扈啸. 面向飞腾迈创DSP的自主软件栈设计[J]. 计算机工程与科学, 2024, 46(06): 968-976.
[4]	郭盼盼, 陈梦雪, 梁祖达, 马晓畅, 许邦建. 面向FT-M7002平台点积算法的优化实现[J]. 计算机工程与科学, 2022, 44(11): 1909-1917.
[5]	何涛, 施慧莉, 李大亮. 基于深度学习的SAR目标识别DSP设计[J]. 计算机工程与科学, 2022, 44(08): 1357-1363.
[6]	陈云, 王梦园, 柴晓楠, 商建东, . 面向FT-M7002的高斯滤波算法优化实现[J]. 计算机工程与科学, 2021, 43(05): 799-806.
[7]	荀长庆, 陈照云, 文梅, 孙海燕, 马奕民. 以编译为导向的Matrix-DSP程序分析与优化[J]. 计算机工程与科学, 2020, 42(10高性能专刊): 1791-1800.
[8]	张象羽，施慧莉. 基于以太网和PCIe的多核DSP开发平台[J]. 计算机工程与科学, 2019, 41(10): 1731-1737.
[9]	田玉恒,马胜,鲁建壮,杨柳. 一种高效的DMA核间同步传输方法[J]. J4, 2016, 38(01): 52-56.
[10]	吴家铸，田希，赵传军，刘衡竹，陈书明. 面向软基站高密集度计算的创新DSP的反汇编器研究[J]. J4, 2013, 35(7): 1-5.
[11]	尹亚明1，刘秋丽2，陈书明1. PCI Express技术在嵌入式MPSoC中的应用[J]. J4, 2013, 35(1): 41-46.
[12]	郑秀聪，谢运祥. DSP与单片机的串行通信及液晶显示系统的设计[J]. J4, 2011, 33(6): 173-177.
[13]	宫〓洵,李建文,王马川. 多通道皮肤听声系统中语音增强算法的应用研究[J]. J4, 2011, 33(4): 164-167.
[14]	刘〓畅. 基于DSP的BP神经网络PID控制器的设计[J]. J4, 2011, 33(4): 154-158.
[15]	王跃平, 苏月明, 刘云生. 基于DM642的流媒体采集压缩系统的研制[J]. J4, 2010, 32(2): 114-117.