• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (12): 2128-2133.

• 高性能计算 • 上一篇    下一篇

面向多智能体博弈的并行蒙特卡洛树搜索算法研究

管延霞1,刘逊韵2,刘运韬1,谢旻1,徐新海2   

  1. (1.国防科技大学计算机学院,湖南 长沙 410073;2.军事科学院战争研究院,北京 100091)

  • 收稿日期:2021-04-02 修回日期:2021-09-24 接受日期:2022-12-25 出版日期:2022-12-25 发布日期:2022-12-25

A parallel Monte Carlo tree search algorithm for multi-agent game

GUAN Yan-xia1,LIU Xun-yun2,LIU Yun-tao1,Xie Min1,XU Xin-hai2   

  1. (1.College of Computer Science and Technology,National University of Defense Technology,Changsha 410073;
    2.War Research Institute,Academy of Military Sciences,Beijing 100091,China)
  • Received:2021-04-02 Revised:2021-09-24 Accepted:2022-12-25 Online:2022-12-25 Published:2022-12-25

摘要: 蒙特卡洛树搜索算法是一种常用的强化学习算法,博弈过程中动态空间的指数级增长是制约该算法学习效率的因素。基于并行方法对蒙特卡洛树搜索算法进行优化,提出基于胜率估值传递的并行蒙特卡洛树搜索算法。改进后的并行博弈搜索策略框架包含一个主进程和多个子进程,其中子进程用于探索,主进程根据子进程传递的胜率估值数据进行决策。结合多智能体博弈平台Pommerman进行实验验证,与传统的蒙特卡罗树搜索算法相比,并行蒙特卡罗树搜索算法有效提高了资源利用率、博弈胜率及决策效率。 

关键词: 多智能体博弈, Pommerman, 多进程, 并行蒙特卡洛树搜索

Abstract: Monte Carlo tree search algorithm is a commonly used reinforcement learning algorithm, and the exponential growth of the dynamic space of the algorithm in the game process has become a factor that restricts the improvement of the algorithm learning efficiency. Based on the parallel approach to optimize the Monte Carlo tree search algorithm, a parallel Monte Carlo tree search algorithm based on the transfer of winning rate estimate is proposed. The improved parallel game search strategy framework consists of one main process and several sub-processes, in which the sub-processes are used for exploration, and the main process makes decisions according to the winning rate estimate data transmitted by the sub-processes. Combined with the multi-agent game platform Pommerman for experimental validation, the parallel Monte Carlo tree search algorithm can enhance the resource utilization rate, game-winning rate, and decision-making efficiency over the traditional Monte Carlo tree search algorithm.

Key words: multi-agent game, Pommerman, multi-process, parallel Monte Carlo tree search