• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (5): 912-920.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于随机对称搜索的进化强化学习算法

邸剑1,2,万雪1,姜丽梅1,3   

  1. (1.华北电力大学计算机系,河北 保定 071003;2.河北省能源电力知识计算重点实验室,河北 保定 071003;
    3.复杂能源系统智能计算教育部工程研究中心,河北 保定 071003)
  • 收稿日期:2023-12-11 修回日期:2024-06-27 出版日期:2025-05-25 发布日期:2025-05-27
  • 基金资助:
    华北电力大学中央高校基本科研业务费专项资金(2022MS102)

An evolutionary reinforcement learning algorithm based on stochastic symmetric search

DI Jian1,2,WAN Xue1,JIANG Limei1,3   

  1. (1.Department of Computer,North China Electric Power University,Baoding 071003;
    2.Hebei Key Laboratory of Knowledge Computing for Energy & Power,Baoding 071003;
    3.Engineering Research Center of Intelligent Computing for Complex Energy Systems,
    Ministry of Education,Baoding 071003,China)ric search
  • Received:2023-12-11 Revised:2024-06-27 Online:2025-05-25 Published:2025-05-27

摘要: 进化算法的引入极大地提高了强化学习算法的性能。然而,现有的基于进化强化学习ERL的算法还存在易陷入欺骗性奖励、易收敛到局部最优和稳定性差的问题。为了解决这些问题,提出了一种随机对称搜索策略,直接作用于策略网络参数,在策略网络参数中心的基础上由最优策略网络参数指导全局策略网络参数优化更新,同时辅以梯度优化,引导智能体进行多元探索。在MuJoCo的5个机器人运动连续控制任务中的实验结果表明,提出的算法性能优于以前的进化强化学习算法,且具有更快的收敛速度。

关键词: 深度强化学习, 进化算法, 进化强化学习, 随机对称搜索

Abstract: The introduction of evolutionary algorithm has greatly improved the performance of reinforcement learning algorithms. However, existing algorithms based on evolutionary reinforcement learning (ERL) still suffer from the problems such as susceptibility to fall into deceptive rewards, easy convergence to local optimums and poor stability. To address these problems, a stochastic symmetric search strategy is proposed. It acts directly on the policy network parameters, and guides the global policy network parameter optimization update by the optimal policy network parameter based on the central of the policy network parameter. Besides, it is supplemented by gradient optimization to guide the intelligentsia for multivariate exploration. Experimental results in five continuous control tasks of robot motion in MuJoCo show that the proposed algorithm outperforms previous evolutionary reinforcement learning algorithms and has a faster convergence rate. 

Key words: deep reinforcement learning, evolutionary algorithm, evolutionary reinforcement learning, stochastic symmet