• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2023, Vol. 45 ›› Issue (04): 718-725.

• Artificial Intelligence and Data Mining • Previous Articles     Next Articles

A strategy search method based on particle swarm optimization and deep reinforcement learning

PENG Kun-yan,YIN Xiang,LIU Xiao-zhu,LI Heng-yu   

  1. (School of Information Engineering(Artificial Intelligence),Yangzhou University,Yangzhou 225117,China)
  • Received:2021-07-08 Revised:2021-11-15 Accepted:2023-04-25 Online:2023-04-25 Published:2023-04-13

Abstract: Deep Reinforcement Learning (DRL) algorithm is a popular policy search method and has been successfully applied to a series of challenging control tasks. However, DRL is difficult to be applied to large-scale practical problems due to its difficulty in dealing with reward sparseness, lack of effective exploration and fragile convergence sensitive to hyperparameters. Particle Swarm Optimization (PSO) is an evolutionary optimization method, which uses the cumulative rewards of the entire episode as the fitness value and is insensitive to the environment with sparse rewards. Moreover, this method also has population-based diversification exploration and stable convergence, but the sample efficiency is low. In this paper, PSO and DRL based on policy gradient are combined. DRL trains the policies with the lowest cumulative rewards in the population through a variety of data provided by the PSO population, and every time the policies with improved cumulative rewards after training is inserted into the PSO population to enhance the information exchange between DRL and PSO population. This algorithm, called PSO-RL, can improve the sample efficiency of PSO and improve the performance and stability of DRL algorithm. Experiments on the challenging continuous control task of the PyBullet module show that PSO-RL performs better than both DRL and the evolutionary reinforcement learning  algorithm.

Key words: particle swarm optimization, strategy search, deep reinforcement learning, policy gradient, reinforcement learning