• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (11): 2053-2062.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于D2GA的逆强化学习算法

段成龙,袁杰,常乾坤,张宁宁   

  1. (新疆大学电气工程学院,新疆 乌鲁木齐 830017)

  • 收稿日期:2023-09-12 修回日期:2024-02-20 接受日期:2024-11-25 出版日期:2024-11-25 发布日期:2024-11-27
  • 基金资助:
    国家自然科学基金(62263031);新疆维吾尔自治区自然科学基金(2022D01C53)

Inverse reinforcement learning algorithm based on D2GA

DUAN Cheng-long,YUAN Jie,CHANG Qian-kun,ZHANG Ning-ning   

  1. (School of Electrical Engineering,Xinjiang University,Urumqi 830017,China)
  • Received:2023-09-12 Revised:2024-02-20 Accepted:2024-11-25 Online:2024-11-25 Published:2024-11-27

摘要: 针对传统生成对抗逆强化学习存在的专家样本获取困难以及生成样本利用率低的问题,提出一种基于事后经验回放策略HER的双鉴别器生成对抗D2GA逆强化学习算法。在该算法中,HER自动合成类专家的正样本,通过D2GA与强化学习方法柔性动作-评价SAC生成的负样本进行对抗性训练,基于所求解的最优奖励函数,利用SAC求解最优策略。将所提出的D2GA算法与经典的逆强化学习算法在Fetch机械臂环境中的4种任务进行了比较实验。结果表明:在没有可用演示数据的情况下,D2GA在相对少的回合数内完成任务的成功率可以达到理想性能,优于当前流行的逆强化学习算法。

关键词: 深度强化学习, 事后经验回放, 逆强化学习, 生成对抗网络

Abstract: Aiming at the difficulty in obtaining expert demonstrations and the low utilization rate of generated samples in the traditional generative adversarial reinforcement learning,a double discriminator generative adversarial (D2GA) inverse reinforcement learning algorithm based on hindsight experience replay (HER) is proposed.In this algorithm,HER automatically synthesizes positive expert-like samples,and conducts adversarial training with negative samples generated by D2GA and reinforcement learning algorithm soft actor-critic (SAC).Based on the solved optimal reward function,SAC is used to solve the optimal strategy.The proposed D2GA algorithm is compared with the classical inverse reinforcement algorithm on four tasks in the Fetch environment.The results show that the success rate of D2GA in completing the task in relatively few rounds can reach ideal performance without available demonstration data,which is better than the current popular inverse reinforcement learning algorithm.


Key words: deep reinforcement learning, hindsight experience replay, inverse reinforcement learning, generative adversarial network