多核处理器中混合分布式共享存储空间的实时划分技术

J4 ›› 2012, Vol. 34 ›› Issue (7): 54-59.

多核处理器中混合分布式共享存储空间的实时划分技术

陈小文1，陈书明1，鲁中海2，Axel Jantsch2

(1.国防科学技术大学计算机学院，湖南长沙 410073;
2.瑞典皇家理工学院电子系统系，瑞典斯德哥尔摩 16440)

收稿日期:2010-07-05 修回日期:2010-10-24 出版日期:2012-07-25 发布日期:2012-07-25
基金资助:
国家863计划资助项目(2009AA011704)；教育部“高性能微处理器技术”创新团队研究计划(IRT0614)

Runtime Partitioning Technique of Hybrid Distributed Shared Memory Space in Multicore Processors

CHEN Xiaowen1,CHEN Shuming1,LU Zhonghai2,Axel Jantsch2

1.School of Computer Science,National University of Defense Technology,Changsha 410073,China;
2.Department of Electronic Systems,KTHRoyal Institute of Technology,Stockholm 16440,Sweden)

Received:2010-07-05 Revised:2010-10-24 Online:2012-07-25 Published:2012-07-25

摘要/Abstract

摘要：

在多核处理器芯片中，分布式共享存储DSM虽然提供了统一的全局寻址的存储空间，但却引入了虚地址向实地址转换的开销，这对性能产生了负面的影响。我们注意到，在并行程序的执行过程中，被处理的数据属性（私有或共享）并不是一成不变的。并行程序中不同的数据具有不同的属性，即使同一数据在程序的不同执行阶段也可能具有不同的属性。本文首先详细地阐述了一种混合式的分布式共享存储空间，支持对共享数据采用全局寻址的虚地址访问而对私有数据采用快速寻址的实地址访问；进而提出了一种针对混合式的分布式共享存储空间的实时划分技术。该技术根据并行程序中数据的属性，在程序运行时，实时地调整和划分分布式共享存储空间。当数据为私有时，通过实地址访问加快数据的访问速度，当数据为共享时则维持虚地址访问，从而减少整个并行程序运行过程中的地址转换开销，提高系统的性能。实际应用程序的实验结果表明，与传统的分布式共享存储空间相比，实时划分的混合式的分布式共享存储空间具有性能优势，性能的提升比例与具体的网络规模、计算规模、并行程序映射方式等有关。在我们的实验中，性能的提升比例最高为13.14%，最低为6.98%。

关键词: 地址转换, 数据属性, 实时划分, 分布式共享存储, 多核处理器

Abstract:

In multicore processors, Distributed Shared Memory (DSM) offers ease of programming by maintaining a global virtual memory space as well as imports the inherent overhead of translating virtual memory addresses into physical memory addresses, resulting in negative performance. We observe that, in parallel applications, different data have different properties (private or shared). Even for the same datum, its property may be changeable in different phases of the program execution. This paper firstly introduces a hybrid DSM, aiming at supporting fast and physical memory accesses for private data and maintaining a global and single virtual memory space for shared data. A runtime partitioning technique is proposed to change the hybrid DSM organization during the program execution. It ensures fast physical memory addressing on private data and conventional virtual memory addressing on shared data, improving the performance of the entire system by reducing virtualtophysical address translation overhead as much as possible. The experimental results show that the hybrid DSM with runtime partitioning demonstrates performance advantage over the conventional DSM counterpart. The percentage of performance improvement depends on network size, problem size, way of data partitioning, etc. In our experiments, the maximal improvement is 13.14%, and the minimal improvement 6.98%.

Key words: address translation;data property;runtime partitioning;distributed shared memory;multicore processor

陈小文1，陈书明1，鲁中海2，Axel Jantsch2. 多核处理器中混合分布式共享存储空间的实时划分技术[J]. J4, 2012, 34(7): 54-59.

CHEN Xiaowen1,CHEN Shuming1,LU Zhonghai2,Axel Jantsch2. Runtime Partitioning Technique of Hybrid Distributed Shared Memory Space in Multicore Processors[J]. J4, 2012, 34(7): 54-59.

[1]	谢洋, 李晨, 陈小文. 面向数据密集型应用的近数据处理架构设计[J]. 计算机工程与科学, 2025, 47(05): 797-810.
[2]	罗莉, 周宏伟, 周理, 潘国腾, 周海亮, 刘彬. 一种多核处理器直连接口QoS的设计与验证[J]. 计算机工程与科学, 2021, 43(04): 620-627.
[3]	魏金晖, 李晨, 鲁建壮. 多GPU系统虚实地址转换架构研究[J]. 计算机工程与科学, 2021, 43(02): 228-234.
[4]	贾朝阳, 张敦博, 王琼, 沈立. 一种高效的压缩Page Walk Cache结构[J]. 计算机工程与科学, 2020, 42(09): 1521-1528.
[5]	陈倩,刘云,高钰莹. 并行动态位向量频繁闭合序列模式挖掘算法[J]. 计算机工程与科学, 2018, 40(10): 1717-1725.
[6]	杨胜哲，于俊清,唐九飞. 数据流程序动态调度与优化[J]. 计算机工程与科学, 2017, 39(07): 1201-1210.
[7]	魏朝磊1，闫民2，赵方1. OpenMP多核技术在颗粒流体力学方法GHM中的应用[J]. 计算机工程与科学, 2017, 39(07): 1234-1240.
[8]	姚文军,陈俊仕,苏志超,余洋,廖陈志,安虹. 基于神威太湖之光的NAMD软件的移植与优化[J]. 计算机工程与科学, 2017, 39(06): 1022-1030.
[9]	杜琦，姜浩，李宽，彭林，杨灿群. 面向ARMv8 64位多核处理器QTRSM的实现[J]. 计算机工程与科学, 2017, 39(03): 451-457.
[10]	朱峰军1，武继刚1, 2，史雯隽1，姜桂圆3. 多选择软硬件划分问题的计算模型与动态规划算法[J]. J4, 2015, 37(04): 641-648.
[11]	王俊，刘磊，张龙，李思昆. 多核处理器事务级模型多视图协同验证环境[J]. J4, 2014, 36(05): 821-827.
[12]	毛席龙，杨安，吕高锋，林琦，程辉. 基于可变步长的访存延迟测量模型的研究与实现[J]. J4, 2014, 36(01): 12-18.
[13]	徐远超1,2,谭旭2,3,范灵俊2,3,孙卫真1，张志敏2. 性能不对称多核处理器负载均衡调度研究[J]. J4, 2013, 35(11): 80-86.
[14]	李艳华，张悠慧，王为，郑纬民. 延时敏感的推测多线程调度策略[J]. J4, 2013, 35(11): 14-21.
[15]	宋伟. 事务存储：具有容错特性的并发控制机制[J]. J4, 2012, 34(7): 46-53.