ParM:基于国产处理器的异构并行编程模型

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (9): 1521-1531.

ParM:基于国产处理器的异构并行编程模型

朱文龙,江嘉治,黄聃,肖侬

（中山大学计算机学院，广东广州 510006）

收稿日期:2023-02-14 修回日期:2023-04-17 出版日期:2023-09-25 发布日期:2023-09-12
基金资助:

国家重点研发计划（2021YFB0301300）;国家自然科学基金（U1811461）;广东省基础与应用基础研究基金（2019B030302002）;广东省引进创新创业团队（2016ZT06D211）;广东省重点领域研发计划（2021B0101190003）;之江实验室项目（2021KC0AB04）

ParM: A heterogeneous programming model for domestic processors

ZHU Wen-long,JIANG Jia-zhi,HUANG Dan,XIAO Nong

（School of Computer Science and Engineering,Sun Yat-sen University,Guangzhou 510006,China）

Received:2023-02-14 Revised:2023-04-17 Online:2023-09-25 Published:2023-09-12

摘要/Abstract

摘要： 随着算力需求的增长，各种国产异构计算设备不断出现，这些设备都有其专用的编程模型，开发者需要根据不同设备的架构特点在专用的编程模型上进行开发，导致开发出的代码在设备间不具有可移植性。近年来国外已经出现了支持多种计算设备的统一异构并行编程模型，但针对国产设备的异构编程模型的研究和实现还比较少。针对该问题，开发了一套性能可移植的异构编程模型ParM。该编程模型以C++库的形式提供，屏蔽了大量的底层实现细节，降低了并行编程难度。该编程框架目前支持的后端设备有x86 CPU、NVIDIA GPU、华为鲲鹏处理器和华为昇腾AI处理器，并且对各种后端设备进行了性能优化。在各种设备上的性能测试表明，ParM编程模型的性能可以达到原始代码的90%以上。

关键词: 性能可移植, 并行编程模型, 高性能计算, 异构计算, 国产处理器

Abstract: With the increasing demand for computing power, various domestically produced heterogeneous computing devices have emerged. These devices have their specialized programming models, and developers need to develop based on the architecture characteristics of different devices using these dedicated programming models. Therefore, the code developed is not portable across devices. In recent years, unified heterogeneous parallel programming models that support various computing devices have appeared overseas, but there is still relatively little research and implementation of heterogeneous programming models for domestically produced devices. To address this issue, a performance-portable heterogeneous programming model called ParM has been developed. This programming model is provided in the form of a C++ library and shields many low-level implementation details, reducing the difficulty of parallel programming. The current backend devices supported by this programming framework include x86 CPUs, NVIDIA GPUs, Huawei Kunpeng processors, and Huawei Ascend AI processors. Performance optimizations have been carried out for these backend devices, and performance test on various devices has shown that the ParM programming model can achieve over 90% performance compared to native code.

Key words: performance portability, parallel programming model, high performance computing, heterogeneous computing, domestic processor

朱文龙, 江嘉治, 黄聃, 肖侬. ParM:基于国产处理器的异构并行编程模型[J]. 计算机工程与科学, 2023, 45(9): 1521-1531.

ZHU Wen-long, JIANG Jia-zhi, HUANG Dan, XIAO Nong. ParM: A heterogeneous programming model for domestic processors[J]. Computer Engineering & Science, 2023, 45(9): 1521-1531.

[1]	石璐, 邹高远, 伍思琦, 张少帅. 基于Tensor Cores的新型GPU架构的高性能Cholesky分解[J]. 计算机工程与科学, 2025, 47(7): 1170-1180.
[2]	李俊哲, 付振新, 杨宏辉, 马银萍, 李若淼, 樊春, . 面向算力网络的跨集群数据迁移系统的设计和实现[J]. 计算机工程与科学, 2025, 47(5): 775-786.
[3]	贾春波, 陈光, 姚信安, 李宝峰. 基于国产元器件的大功率多相供电技术研究[J]. 计算机工程与科学, 2025, 47(4): 592-600.
[4]	张建民, 许炜康, 刘津津, 黎铁军. 粒子输运非确定性模拟的加速方法研究进展[J]. 计算机工程与科学, 2025, 47(1): 1-9.
[5]	李沛桢, 张洋, 陈文波. 基于DPCT的序列比对软件迁移与性能评估[J]. 计算机工程与科学, 2024, 46(8): 1372-1380.
[6]	郭宸良, 阎少宏, 宗晨琪. 线云隐私攻击算法的并行加速研究[J]. 计算机工程与科学, 2024, 46(4): 615-625.
[7]	孙岩, 张建民, 黎渊, 孙舜禹. 面向高性能计算的互连网络拥塞控制分析与评估[J]. 计算机工程与科学, 2024, 46(2): 209-216.
[8]	张云泉, 邓力, 袁良, 袁国兴. 2024年中国高性能计算机发展现状分析[J]. 计算机工程与科学, 2024, 46(12): 2091-2098.
[9]	吴铁彬, 过锋, 王谛. 面向E级计算的高性能处理器核心运算架构研究进展[J]. 计算机工程与科学, 2023, 45(5): 761-771.
[10]	刘忠沛, 吕高锋, 王继昌, 杨翔瑞. 专用数据处理器综述[J]. 计算机工程与科学, 2023, 45(2): 215-227.
[11]	张云泉, 邓力, 袁良, 袁国兴. 2023年中国高性能计算机发展现状分析[J]. 计算机工程与科学, 2023, 45(12): 2091-2098.
[12]	施得君, 李宏亮, 胡舒凯. 基于Clos网络的高阶路由器结构[J]. 计算机工程与科学, 2023, 45(12): 2099-2112.
[13]	张天阳, 池成悦, 郭武, 高亦沁, 文敏华, 韦建文. 校级异地超算集群管理的关键技术研究与实践[J]. 计算机工程与科学, 2023, 45(12): 2135-2145.
[14]	肖调杰, 周峰, 郑翾宇, 刘剑, 陈琳, 刘杰, 易明宽, 陈旭光, 龚春叶, 杨博, 甘新标, 李胜国, 左克, . 大规模三维频率域电磁积分方程法数值模拟[J]. 计算机工程与科学, 2023, 45(11): 1901-1910.
[15]	吴超, 卫谦, 周俊伟, 李会民, 孙广中. 基于异构计算平台的背景噪声预处理并行算法[J]. 计算机工程与科学, 2023, 45(10): 1711-1719.