• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2024, Vol. 46 ›› Issue (06): 1081-1091.

• Artificial Intelligence and Data Mining • Previous Articles     Next Articles

A population diversity-based robust policy generation method in adversarial game environments#br#

ZHUANG Shu-xin1,CHEN Yong-hong2,HAO Yi-hang2,WU Wei-wei1,XU Xue-yong3,WANG Wan-yuan1#br#   

  1. (1.School of Computer Science and Engineering,Southeast University,Nanjing 211189;
    2.Shenyang Aeroengine Design and Research Institute,
    Yangzhou Collaborative Innovation Research Institute Co.,Ltd.,Yangzhou 210016;
    3.Nanjing North Information Industrialization Group Co.,Ltd.,Nanjing 211189,China)
  • Received:2023-10-12 Revised:2023-12-05 Accepted:2024-06-25 Online:2024-06-25 Published:2024-06-18

Abstract: In adversarial game environments, the objective agent aims to generate robust game policies, ensuring high returns when facing different opponent policies consistently. Existing self-play-based policy generation methods often overfit to learning against a specific opponent policy, resulting in low robustness and vulnerability to attacks from other opponent policies. Additionally, existing methods that combine deep rein-forcement learning and game theory to iteratively generate opponent policies have low convergence efficiency in complex adversarial scenarios with large decision spaces. To address these challenges, a population diversity-based robust policy generation method is proposed. In this method, both adversaries maintain a policy population pool, ensuring diversity within the population to generate a robust target policy. To ensure population diversity, policy diversity is measured from two perspectives: behavioral and quality diversity. Behavioral diversity refers to the differences in state-action trajectories of different policies, while quality diversity refers to the differences in the returns obtained when facing the same opponent. Finally, the robustness of the policies generated based on population diversity is validated in typical adversarial environments with continuous stateaction spaces.


Key words: