基于强化学习的体系对抗仿真战役层次指控算法

计算机工程与科学

基于强化学习的体系对抗仿真战役层次指控算法

闫雪飞,李新明,刘东,刘德生,李强

（装备学院复杂电子系统仿真实验室，北京 101416）

收稿日期:2017-04-24 修回日期:2017-06-28 出版日期:2018-08-25 发布日期:2018-08-25
基金资助:
装备预研领域基金项目（61400010103）;重点实验室基础研究项目(DXZTJCZZ2015007)。

A RL-based command and control algorithm for

SoS confrontation simulation at the tactical level

YAN Xuefei,LI Xinming,LIU Dong,LIU Desheng，LI Qiang

(Laboratory of Science and Technology on Complex Electronic System Simulation,Equipment Academy,Beijing 101416,China)

Received:2017-04-24 Revised:2017-06-28 Online:2018-08-25 Published:2018-08-25

摘要/Abstract

摘要：

针对传统的认知决策技术无法有效应对体系对抗环境具有的不确定性、未知性以及复杂性问题，提出一种基于强化学习（RL）的体系对抗仿真战役层次指控算法。介绍了包含侦察类、打击类、通信类、补给类、修复类以及指控类Agent的UML体系架构，对自主开发的作战仿真原型系统及其作战想定进行了说明，在对战役层次指控Agent认知域描述与假设的基础上，对改进Qlearning认知决策算法的参数归一化、基于GRBF神经网络的Q离散、基于TD公式的跨步差分机制以及网络结构的学习训练过程进行了详细说明。最后，通过地空一体化联合体系对抗仿真验证了算法的有效性，并通过对算法的大量可视化回溯分析发现，一定程度的火力协调以及不间断的战术机动对于作战效能的提升以及毁伤的减免具有重要的意义。

关键词: 武器装备体系, 作战仿真, 强化学习, GRBF神经网络, 认知决策

Abstract:

Aiming at the problem that the traditional cognition techniques are not adaptive to the uncertainty and complexity in the Weapon SystemofSystems (WSoS) confrontation environment, a Command and Control (C2) algorithm based on Reinforcement Learning (RL) is proposed for the WSoS confrontation simulation at the tactical level. The UML architecture of WSoS that consists of a communication class, scouting class, attacking class, command class, supplying class and repairing class is designed and the battle simulation platform with the battle scenario is introduced. Then, based on the illustration and hypothesis for the command agent’s cognition problem, the parameter’s normalization, the discrete of the Q table based on GRBF neural network, the strip temporal difference mechanism and the learning process of the structure of the network are explained for the improved Qleaning cognition algorithm. Finally, the validation and effectiveness of the algorithm is proved through the battle simulation experiment of the airground unify confrontation SoS. Besides, through a lot of visualization recall analysis for the C2 algorithm, we found that the coordination of the firepower and the continuous tactical maneuver are important to the operational effectiveness and injure decrease.

Key words: weapon system-of-systems, battle simulation, reinforcement learning, GRBF neural network, cognition and decision

闫雪飞,李新明,刘东,刘德生,李强. 基于强化学习的体系对抗仿真战役层次指控算法[J]. 计算机工程与科学.

YAN Xuefei,LI Xinming,LIU Dong,LIU Desheng，LI Qiang.

A RL-based command and control algorithm for

SoS confrontation simulation at the tactical level

[J]. Computer Engineering & Science.

[1]	陈俊彦1, 李欣梅1, 朱昌洪2, 肖微3. 基于多视图图注意力机制的软件定义光传输网络路由优化算法[J]. 计算机工程与科学, 2025, 47(7): 1193-1204.
[2]	李天云, 李韬, 温冬, 杨惠, 张毓涛, 罗欣, 董德尊. 基于人工智能方法的网络拥塞控制综述[J]. 计算机工程与科学, 2025, 47(6): 1018-1027.
[3]	邸剑, 万雪, 姜丽梅, . 基于随机对称搜索的进化强化学习算法[J]. 计算机工程与科学, 2025, 47(5): 912-920.
[4]	魏东, 贾宇辰, 韩少然. 数据中心制冷系统强化学习控制[J]. 计算机工程与科学, 2025, 47(3): 422-433.
[5]	章政, 夏小云, 陈泽丰, 向毅. 融合强化学习的分阶段策略求解旅行背包问题[J]. 计算机工程与科学, 2025, 47(1): 140-149.
[6]	余世瑞, 姜春茂. 基于模糊强化学习的云计算虚拟机调度策略[J]. 计算机工程与科学, 2025, 47(1): 56-65.
[7]	庄述鑫, 陈永红, 郝一行, 吴巍炜, 徐学永, 王万元. 对抗环境中基于种群多样性的鲁棒策略生成方法[J]. 计算机工程与科学, 2024, 46(6): 1081-1091.
[8]	段成龙, 袁杰, 常乾坤, 张宁宁. 基于D2GA的逆强化学习算法[J]. 计算机工程与科学, 2024, 46(11): 2053-2062.
[9]	顾颖程, 魏柳, 姜宁, 程环宇, 刘凯, 宋玉, 刘梅招, 汤雷, 陈彧, 张胜. 边缘场景下面向分布式交互应用的服务器分配[J]. 计算机工程与科学, 2024, 46(10): 1748-1756.
[10]	蔡玉, 官铮, 王增文, 王学, 杨志军. 基于多智能体深度强化学习的车联网区分业务资源分配算法[J]. 计算机工程与科学, 2024, 46(10): 1757-1764.
[11]	曾凡锋, 王春真, 李琛. 基于深浅层特征融合的无监督视频摘要算法研究[J]. 计算机工程与科学, 2023, 45(9): 1602-1610.
[12]	王扬, 陈智斌. 一种求解CVRP的动态图转换模型[J]. 计算机工程与科学, 2023, 45(5): 859-868.
[13]	彭坤彦, 尹翔, 刘笑竹, 李恒宇. 基于粒子群优化和深度强化学习的策略搜索方法[J]. 计算机工程与科学, 2023, 45(4): 718-725.
[14]	韩虎, 孙天岳, 赵启涛. 引入自编码机制对抗网络的文本生成模型[J]. 计算机工程与科学, 2020, 42(9): 1704-1710.
[15]	官蕊, 丁家满, 贾连印, 游进国, 姜瑛, . 基于强化学习的多样性文档排序算法[J]. 计算机工程与科学, 2020, 42(9): 1697-1703.