面向大规模系统的并行进化策略框架

计算机工程与科学 ›› 2026, Vol. 48 ›› Issue (1): 11-19.

面向大规模系统的并行进化策略框架

张涵,王小平

(湖南大学信息科学与工程学院，湖南长沙 410082)

收稿日期:2024-04-11 修回日期:2024-09-23 出版日期:2026-01-25 发布日期:2026-01-25

A parallel evolution strategy framework for large-scale system

ZHANG Han,WANG Xiaoping

(College of Computer Science and Electronic Engineering,Hunan University,Changsha 410082,China)

Received:2024-04-11 Revised:2024-09-23 Online:2026-01-25 Published:2026-01-25

摘要/Abstract

摘要： 进化策略算法是一种高效的优化算法，适用于解决无梯度信息或难以获取梯度信息的问题，广泛应用于强化学习和黑盒优化等任务。随着问题规模和复杂度的增加，进化策略算法的采样规模也越来越大，相应地，计算并行度也随之增加。面向大规模系统,提出了新的并行进化策略算法框架，主要优化进化策略算法在超大规模并行执行中的容错计算和通信开销问题。针对这些问题，提出了高并发的规约机制，并针对算法特点提出了低开销的容错方法。实验显示,新的并行进化策略算法框架在大规模系统中的并行效率在54.7%以上，且在并行规模上升至上万节点时，并行效率比OpenAI-NES高出23%。

关键词: 进化策略, 黑盒优化, 容错计算, 并行计算

Abstract: Evolution strategies (ES) algorithm is an efficient optimization algorithm suitable for solving problems where gradient information is either unavailable or difficult to obtain. It is widely applied in tasks such as reinforcement learning and black-box optimization. As the scale and complexity of problems increase, the sampling size of the ES algorithm also grows larger, leading to a corresponding increase in computational parallelism. For large-scale systems, a new parallel ES algorithm framework is proposed, primarily focusing on optimizing fault-tolerant computing and communication overhead during ultra-large-scale parallel execution of the algorithm. To address these issues, a high-concurrency reduction mechanism is introduced, along with a low-overhead fault-tolerance method tailored to the algorithm’s characteristics. Experimental results demonstrate that the parallel efficiency of the new algorithm framework in large-scale systems exceeds 54.7%, and when the parallel scale expands to tens of thousands of nodes, the parallel efficiency is 23% higher than OpenAI-NES.

Key words: evolution strategies, black-box optimization, fault-tolerant computing, parallel computing

张涵, 王小平. 面向大规模系统的并行进化策略框架[J]. 计算机工程与科学, 2026, 48(1): 11-19.

ZHANG Han, WANG Xiaoping. A parallel evolution strategy framework for large-scale system[J]. Computer Engineering & Science, 2026, 48(1): 11-19.

[1]	程其宏1, 刘鹏1, 姚廉1, 尤志强2, 武继刚1. 一种针对固定故障的忆阻神经网络容错方案[J]. 计算机工程与科学, 2025, 47(9): 1691-1699.
[2]	范宜凯, 刘石坚, 潘正祥, . 一种基于改进拟仿射变换的基础矩阵估计方法[J]. 计算机工程与科学, 2021, 43(11): 2003-2010.
[3]	夏慧明1,周永权2. 求解对称区间矩阵标准特征值的进化策略新算法[J]. J4, 2011, 33(2): 97-101.
[4]	王慧华朱兆辉吴克立. 基于改进进化策略的CSCL任务分配自适应决策方法[J]. J4, 2007, 29(6): 65-67.
[5]	郭德龙周永权. 进化策略在极大似然法参数估计中的应用[J]. J4, 2007, 29(10): 38-40.
[6]	高阳李蔷. 混合进化策略算法在多准则决策中的应用研究[J]. J4, 2006, 28(1): 78-81.
[7]	韩炜[1] 谢克嘉[2] 等. 四余度容错计算机系统结构及其可靠性分析[J]. J4, 2003, 25(1): 98-100.
[8]	闵应骅. 可信系统与网络[J]. J4, 2001, 23(5): 21-23.