• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2026, Vol. 48 ›› Issue (1): 11-19.

• High Performance Computing • Previous Articles     Next Articles

A parallel evolution strategy framework for large-scale system

ZHANG Han,WANG Xiaoping   

  1. (College of Computer Science and Electronic Engineering,Hunan University,Changsha 410082,China)
  • Received:2024-04-11 Revised:2024-09-23 Online:2026-01-25 Published:2026-01-25

Abstract: Evolution strategies (ES) algorithm is an efficient optimization algorithm suitable for solving problems where gradient information is either unavailable or difficult to obtain. It is widely applied in tasks such as reinforcement learning and black-box optimization. As the scale and complexity of problems increase, the sampling size of the ES algorithm also grows larger, leading to a corresponding increase in computational parallelism. For large-scale systems, a new parallel ES algorithm framework is proposed, primarily focusing on optimizing fault-tolerant computing and communication overhead during ultra-large-scale parallel execution of the algorithm. To address these issues, a high-concurrency reduction mechanism is introduced, along with a low-overhead fault-tolerance method tailored to the algorithm’s characteristics. Experimental results demonstrate that the parallel efficiency of the new algorithm framework in large-scale systems exceeds 54.7%, and when the parallel scale expands to tens of thousands of nodes, the parallel efficiency is 23% higher than OpenAI-NES.

Key words: evolution strategies, black-box optimization, fault-tolerant computing, parallel computing