• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (01): 10-17.

• 高性能计算 • 上一篇    下一篇

DSP处理器二级缓存的结构优化研究

安昕辰   

  1. (1.南京信息工程大学电子与信息工程学院,江苏 南京 210044;2.中国电子科技集团公司第五十八研究所,江苏 无锡 214072)

  • 收稿日期:2023-10-30 修回日期:2024-03-22 接受日期:2025-01-25 出版日期:2025-01-25 发布日期:2025-01-18

Structure optimization of second-level Cache in DSP processor

AN Xinchen   

  1. (1.School of Electronic & Information Engineering,Nanjing University of Information Science & Technology,Nanjing 210044;
    2.58th Research Institute,China Electronics Technology Group Corporation,Wuxi 214072,China)
  • Received:2023-10-30 Revised:2024-03-22 Accepted:2025-01-25 Online:2025-01-25 Published:2025-01-18

摘要: 近年来自动驾驶、医用仪器、智能家居等领域涌现出的新应用对DSP处理器的实时性和数据吞吐能力提出了更高的要求。多级缓存结构在DSP中的使用引入了因缓存缺失和一致性维护等过程带来的延迟不确定性。针对长延时访问导致的性能下降问题,提出将缺失缓冲区和逐出缓冲区合并,在运行时灵活分配缓冲条目的功能,以提高缓冲区利用率。针对L1 Cache、L2 Cache间一致性维护信息同步效率低的问题,提出利用无效化地址的连续性,将无效化信息非阻塞地同步到监听过滤器。测试结果表明,生产者-消费者场景下包含大量脏数据更新的程序性能提高了19.91%,32行无效化信息的同步时间从61个时钟周期降低到16个时钟周期。

关键词: DSP, 二级缓存, 流水线, 一致性

Abstract: In recent years, emerging applications in fields such as autonomous driving, medical instruments, and smart homes have placed higher demands on the real-time performance and data throughput capabilities of DSP processors. The use of multi-level cache structures in DSPs introduces latency uncertainties due to processes such as cache misses and coherency maintenance. Aiming at allevi- ating the performance degradation caused by long delay access, the method of combining miss status holding registers and victim buffer into one structure is proposed. This structure allocates its item function flexibly at runtime to improve buffer utilization. Aiming at the low synchronization efficiency of coherency maintenance information between L1  Cache and L2  Cache, this paper proposes to use the continuity between invalid addresses to synchronize invalid information to the snoop filter without blocking. The test results show that the performance of the producer-consumer scenario program with many dirty data updates is improved by 19.91%, and the synchronization time of 32 lines of invalid information decreased from 61 cycles to 16 cycles.

Key words: digital signal processer (DSP), L2 Cache, pipeline, coherency