• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2026, Vol. 48 ›› Issue (1): 11-19.

• 高性能计算 • 上一篇    下一篇

面向大规模系统的并行进化策略框架

张涵,王小平   

  1. (湖南大学信息科学与工程学院,湖南 长沙 410082) 

  • 收稿日期:2024-04-11 修回日期:2024-09-23 出版日期:2026-01-25 发布日期:2026-01-25

A parallel evolution strategy framework for large-scale system

ZHANG Han,WANG Xiaoping   

  1. (College of Computer Science and Electronic Engineering,Hunan University,Changsha 410082,China)
  • Received:2024-04-11 Revised:2024-09-23 Online:2026-01-25 Published:2026-01-25

摘要: 进化策略算法是一种高效的优化算法,适用于解决无梯度信息或难以获取梯度信息的问题,广泛应用于强化学习和黑盒优化等任务。随着问题规模和复杂度的增加,进化策略算法的采样规模也越来越大,相应地,计算并行度也随之增加。面向大规模系统,提出了新的并行进化策略算法框架,主要优化进化策略算法在超大规模并行执行中的容错计算和通信开销问题。针对这些问题,提出了高并发的规约机制,并针对算法特点提出了低开销的容错方法。实验显示,新的并行进化策略算法框架在大规模系统中的并行效率在54.7%以上,且在并行规模上升至上万节点时,并行效率比OpenAI-NES高出23%。

关键词: 进化策略, 黑盒优化, 容错计算, 并行计算

Abstract: Evolution strategies (ES) algorithm is an efficient optimization algorithm suitable for solving problems where gradient information is either unavailable or difficult to obtain. It is widely applied in tasks such as reinforcement learning and black-box optimization. As the scale and complexity of problems increase, the sampling size of the ES algorithm also grows larger, leading to a corresponding increase in computational parallelism. For large-scale systems, a new parallel ES algorithm framework is proposed, primarily focusing on optimizing fault-tolerant computing and communication overhead during ultra-large-scale parallel execution of the algorithm. To address these issues, a high-concurrency reduction mechanism is introduced, along with a low-overhead fault-tolerance method tailored to the algorithm’s characteristics. Experimental results demonstrate that the parallel efficiency of the new algorithm framework in large-scale systems exceeds 54.7%, and when the parallel scale expands to tens of thousands of nodes, the parallel efficiency is 23% higher than OpenAI-NES.

Key words: evolution strategies, black-box optimization, fault-tolerant computing, parallel computing