基于随机对称搜索的进化强化学习算法

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (5): 912-920.

基于随机对称搜索的进化强化学习算法

邸剑1,2，万雪1，姜丽梅1,3

（1.华北电力大学计算机系，河北保定 071003；2.河北省能源电力知识计算重点实验室，河北保定 071003；
3.复杂能源系统智能计算教育部工程研究中心,河北保定 071003）

收稿日期:2023-12-11 修回日期:2024-06-27 出版日期:2025-05-25 发布日期:2025-05-27
基金资助:
华北电力大学中央高校基本科研业务费专项资金(2022MS102)

An evolutionary reinforcement learning algorithm based on stochastic symmetric search

DI Jian1,2,WAN Xue1,JIANG Limei1,3

(1.Department of Computer,North China Electric Power University,Baoding 071003；
2.Hebei Key Laboratory of Knowledge Computing for Energy & Power,Baoding 071003；
3.Engineering Research Center of Intelligent Computing for Complex Energy Systems,
Ministry of Education,Baoding 071003,China)ric search

Received:2023-12-11 Revised:2024-06-27 Online:2025-05-25 Published:2025-05-27

摘要/Abstract

摘要： 进化算法的引入极大地提高了强化学习算法的性能。然而，现有的基于进化强化学习ERL的算法还存在易陷入欺骗性奖励、易收敛到局部最优和稳定性差的问题。为了解决这些问题，提出了一种随机对称搜索策略，直接作用于策略网络参数，在策略网络参数中心的基础上由最优策略网络参数指导全局策略网络参数优化更新，同时辅以梯度优化，引导智能体进行多元探索。在MuJoCo的5个机器人运动连续控制任务中的实验结果表明，提出的算法性能优于以前的进化强化学习算法，且具有更快的收敛速度。

关键词: 深度强化学习, 进化算法, 进化强化学习, 随机对称搜索

Abstract: The introduction of evolutionary algorithm has greatly improved the performance of reinforcement learning algorithms. However, existing algorithms based on evolutionary reinforcement learning (ERL) still suffer from the problems such as susceptibility to fall into deceptive rewards, easy convergence to local optimums and poor stability. To address these problems, a stochastic symmetric search strategy is proposed. It acts directly on the policy network parameters, and guides the global policy network parameter optimization update by the optimal policy network parameter based on the central of the policy network parameter. Besides, it is supplemented by gradient optimization to guide the intelligentsia for multivariate exploration. Experimental results in five continuous control tasks of robot motion in MuJoCo show that the proposed algorithm outperforms previous evolutionary reinforcement learning algorithms and has a faster convergence rate.

Key words: deep reinforcement learning, evolutionary algorithm, evolutionary reinforcement learning, stochastic symmet

邸剑, 万雪, 姜丽梅, . 基于随机对称搜索的进化强化学习算法[J]. 计算机工程与科学, 2025, 47(5): 912-920.

DI Jian, WAN Xue, JIANG Limei, . An evolutionary reinforcement learning algorithm based on stochastic symmetric search[J]. Computer Engineering & Science, 2025, 47(5): 912-920.

[1]	闫盼，谭瑛，张建华. 一种用于进化算法历史计算数据的高效利用方法[J]. J4, 20160101, 38(01): 62-66.
[2]	张琪1, 顾腾达1, 任宇辰1, 季津琪2, 陈海涛1. 多策略改进的蝴蝶优化算法[J]. 计算机工程与科学, 2025, 47(7): 1312-1320.
[3]	王冰彬, 唐震洲 . 一种结合遗传算法和聚类的软件定义网络控制器优化部署机制[J]. 计算机工程与科学, 2024, 46(11): 1971-1978.
[4]	张明珠, 曹杰, 王斌. 基于精英集的多目标差分进化聚类算法[J]. 计算机工程与科学, 2021, 43(01): 170-179.
[5]	胡福年，董倩男. 多策略自适应变异的差分进化算法及其应用[J]. 计算机工程与科学, 2020, 42(06): 1076-1088.
[6]	宋强1,刘亚萍2,刘珍兰1. 基于多代种群进化信息改进的差分进化算法研究[J]. 计算机工程与科学, 2018, 40(11): 2054-2059.
[7]	黄辉先，胡鹏飞. 基于共轭梯度法的反馈差分进化混合算法及其在弹簧设计中的应用[J]. 计算机工程与科学, 2018, 40(07): 1316-1322.
[8]	朱林波，汪继文，邱剑锋，方柳平. 基于简化群优化算法和协方差矩阵学习的差分进化算法[J]. 计算机工程与科学, 2017, 39(11): 2122-2130.
[9]	卢鹏飞1，王兴伟2，李福亮1，马连博2. 一种基于种族分类进化的QoS异构组播路由机制[J]. 计算机工程与科学, 2016, 38(08): 1633-1639.
[10]	徐曼舒，汪继文，邱剑锋，王心灵. 基于改进人工蜂群的模糊C-均值聚类算法[J]. J4, 2016, 38(06): 1238-1243.
[11]	闫盼，谭瑛，张建华. 一种用于进化算法历史计算数据的高效利用方法[J]. J4, 2016, 38(01): 62-66.
[12]	王林1，彭璐1，夏德2，曾奕1. 自适应差分进化算法优化BP神经网络的时间序列预测[J]. J4, 2015, 37(12): 2270-2275.
[13]	张弛1，乐晓波1，周恺卿2，莫礼平3. 采用差分进化算法优化模糊Petri网参数[J]. J4, 2014, 36(06): 1095-1100.
[14]	朱高峰1，伍铁斌2,3，张艳蕾1，成运2，刘云连2. 一种求解约束优化问题的进化算法及其工程应用[J]. J4, 2013, 35(7): 95-101.
[15]	王林，冯云涛，富庆亮. 差分进化算法在模糊多资源约束联合补货模型中的应用[J]. J4, 2012, 34(1): 148-153.