2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于深度强化学习的多机协同空战方法研究

施伟 冯旸赫 程光权 黄红蓝 黄金才 刘忠 贺威

施伟,  冯旸赫,  程光权,  黄红蓝,  黄金才,  刘忠,  贺威.  基于深度强化学习的多机协同空战方法研究.  自动化学报,  2021,  47(7): 1610−1623 doi: 10.16383/j.aas.c201059
引用本文: 施伟,  冯旸赫,  程光权,  黄红蓝,  黄金才,  刘忠,  贺威.  基于深度强化学习的多机协同空战方法研究.  自动化学报,  2021,  47(7): 1610−1623 doi: 10.16383/j.aas.c201059
Shi Wei,  Feng Yang-He,  Cheng Guang-Quan,  Huang Hong-Lan,  Huang Jin-Cai,  Liu Zhong,  He Wei.  Research on multi-aircraft cooperative air combat method based on deep reinforcement learning.  Acta Automatica Sinica,  2021,  47(7): 1610−1623 doi: 10.16383/j.aas.c201059
Citation: Shi Wei,  Feng Yang-He,  Cheng Guang-Quan,  Huang Hong-Lan,  Huang Jin-Cai,  Liu Zhong,  He Wei.  Research on multi-aircraft cooperative air combat method based on deep reinforcement learning.  Acta Automatica Sinica,  2021,  47(7): 1610−1623 doi: 10.16383/j.aas.c201059

基于深度强化学习的多机协同空战方法研究

doi: 10.16383/j.aas.c201059
基金项目: 国家自然科学基金(71701205, 62073333)资助
详细信息
    作者简介:

    施伟:国防科技大学系统工程学院硕士研究生. 2019年获得国防科技大学学士学位. 主要研究方向为层次强化学习, 多agent智能规划和多agent深度强化学习. E-mail: shiwei15@nudt.edu.cn

    冯旸赫:国防科技大学系统工程学院副教授. 获得国防科技大学硕士、博士学位. 主要研究方向为因果发现与推理, 主动学习和强化学习. 本文通信作者. E-mail: fengyanghe@nudt.edu.cn

    程光权:国防科技大学系统工程学院副研究员. 主要研究方向为链路预测. E-mail: cgq299@163.com

    黄红蓝:国防科技大学系统工程学院博士研究生. 主要研究方向为主动学习, 元学习. E-mail: huanghonglan17@nudt.edu.cn

    黄金才:国防科技大学系统工程学院教授. 主要研究方向为智能调度与控制. E-mail: huangjincai@nudt.edu.cn

    刘忠:国防科技大学系统工程学院教授. 主要研究方向为智能规划与决策, 深度强化学习和多智能体系统. E-mail: liuzhong@nudt.edu.cn

    贺威:北京科技大学人工智能研究院、自动化学院教授. 主要研究方向为机器人学, 振动控制和智能控制系统. E-mail: weihe@ieee.org

Research on Multi-aircraft Cooperative Air Combat Method Based on Deep Reinforcement Learning

Funds: Supported by National Natural Science Foundation of China (71701205, 62073333)
More Information
    Author Bio:

    SHI Wei Master student at the College of Systems Engineering, National University of Defense Technology. He received his bachelor degree from National University of Defense Technology in 2019. His research interest covers hierarchical reinforcement learning, multi-agent intelligence planning, and multi-agent deep reinforcement learning

    FENG Yang-He Associate professor at the College of Systems Engineering, National University of Defense Technology. He received his master and Ph. D. degrees from National University of Defense Technology. His research interest covers the casual discovery and inference, active learning, and reinforcement learning. Corresponding author of this paper

    CHENG Guang-Quan Associate research fellow at the College of Systems Engineering, National University of Defense Technology. His main research interest is link prediction

    HUANG Hong-Lan Ph. D. candidate at the College of Systems Engineering, National University of Defense Technology. Her research interest covers active learning and meta learning

    HUANG Jin-Cai Professor at the College of Systems Engineering, National University of Defense Technology. His main research interest is intelligent scheduling and control

    LIU Zhong Professor at the College of Systems Engineering, National University of Defense Technology. His research interest covers intelligent planning and decision-making, deep reinforcement learning, and multi-agent system

    HE Wei Professor at the Institute of Artificial Intelligence and School of Automation and Electrical Engineering, University of Science and Technology Beijing. His research interest covers robotics, vibration control, and intelligent control system

  • 摘要:

    多机协同是空中作战的关键环节, 如何处理多实体间复杂的协作关系、实现多机协同空战的智能决策是亟待解决的问题. 为此, 提出基于深度强化学习的多机协同空战决策流程框架(Deep-reinforcement-learning-based multi-aircraft cooperative air combat decision framework, DRL-MACACDF), 并针对近端策略优化(Proximal policy optimization, PPO)算法, 设计4种算法增强机制, 提高多机协同对抗场景下智能体间的协同程度. 在兵棋推演平台上进行的仿真实验, 验证了该方法的可行性和实用性, 并对对抗过程数据进行了可解释性复盘分析, 研讨了强化学习与传统兵棋推演结合的交叉研究方向.

    1)  1版本号: v1.4.1.0
  • 图  1  PPO训练流程图

    Fig.  1  PPO algorithm training flow chart

    图  2  多机协同空战决策流程框架

    Fig.  2  Multi-aircraft collaborative air combat decision framework

    图  3  集中式训练−分布式执行架构

    Fig.  3  Framework of centralized training and decentralized execution

    图  4  想定示意图

    Fig.  4  Scenario diagram

    图  5  算法有效性对比图

    Fig.  5  Algorithm effectiveness comparison diagram

    图  6  消融实验算法性能对比图

    Fig.  6  Performance comparison diagram of ablation experimental algorithm

    图  7  累计胜率曲线

    Fig.  7  Cumulative winning rate curve

    图  8  胜率分布图

    Fig.  8  Winning rate distribution map

    图  9  双机编队

    Fig.  9  Two-plane formation

    图  10  三机编队

    Fig.  10  Three-plane formation

    图  11  包夹战术

    Fig.  11  Converging attack

    图  12  发挥射程优势

    Fig.  12  Usage of maximum attack range

    图  13  快速机动避弹

    Fig.  13  Fast maneuvers to avoid attack

    图  14  诱骗敌方弹药

    Fig.  14  Consume enemy ammunition

    A1  神经网络示意图

    A1  Diagrams of neural network

    表  1  算法有效性实验数据统计

    Table  1  Experimental statistics of algorithm effectiveness

    算法平均得分得分标准差平均胜率 (%)
    DRL-MACACDF18.92910.83591.472
    PPO−21.1791.6980
    下载: 导出CSV

    表  2  消融实验设置

    Table  2  The setting of ablation experiment

    模型嵌入式专家
    经验奖励
    机制
    经验共享
    机制
    自适应权重及
    优先采样机制
    鼓励
    探索
    机制
    DRL-MACACDF
    DRL-MACACDF-R
    DRL-MACACDF-A
    DRL-MACACDF-S
    DRL-MACACDF-E
    注: ● 表示包含该机制, ○ 表示不包含
    下载: 导出CSV

    表  3  消融实验数据统计

    Table  3  Statistics of ablation experimental results

    模型平均得分平均得分比传统 PPO
    提高百分比 (%)
    平均胜率
    (%)
    RL-MACACDF-R−19.2971308.3270
    RL-MACACDF-A13.629237154.01986.774
    RL-MACACDF-S5.021890115.93466.673
    RL-MACACDF-E8.973194133.41782.361
    下载: 导出CSV

    A1  实验超参数设置

    A1  Experimental hyperparameter setting

    参数名参数值参数名参数值
    网络优化器Adam经验库容量3000 (个)
    学习率5 × 10−5批大小200 (个)
    折扣率0.9$ \tau $初始值1.0
    裁剪率0.2${\tau _{{\rm{step}}} }$1 × 10−4
    训练开始样本数1400 (个)${\tau _{{\rm{temp}}} }$50000
    下载: 导出CSV

    A2  想定实体类型

    A2  Entity type of scenario

    单元类型数量主要作战武器
    F/A-18 型战斗机24 × AIM-120D 空空导弹
    2 × AGM-154C 空地导弹
    F-35C 型战斗机16 × AGM-154C 空地导弹
    基地12 × F/A-18 型战斗机
    1 × F-35C 型战斗机
    下载: 导出CSV

    A3  推演事件得分

    A3  The score of deduction events

    推演事件得分
    击毁一架飞机139
    损失一架飞机−139
    击毁基地1843
    损失基地−1843
    下载: 导出CSV

    A4  状态空间信息

    A4  State space information

    实体信息
    己方飞机经度、纬度、速度、朝向、海拔、目标点经度、目标点纬度等 7 维信息
    己方导弹经度、纬度、速度、朝向、海拔、打击目标的经度、打击目标的纬度等 7 维信息
    敌方飞机经度、纬度、速度、朝向、海拔等 5 维信息
    敌方导弹经度、纬度、速度、朝向、海拔等 5 维信息
    下载: 导出CSV

    A5  动作空间信息

    A5  Action space information

    类别取值范围
    飞行航向0°、60°、120°、180°、240°、300°
    飞行高度7620 米、10973 米、15240 米
    飞行速度低速、巡航、加力
    自动开火距离35 海里、40 海里、45 海里、
    50 海里、60 海里、70 海里
    导弹齐射数量1 枚、2 枚
    下载: 导出CSV
  • [1] 李卿莹. 协同空战技术发展概况及作战模式. 科技与创新, 2020 (07): 124−126

    Li Qing-Ying. Overview of collaborative air combat technology development and operational mode. Science and Technology and Innovation, 2020 (07): 124−126
    [2] Isaacs R. Differential Games: A Mathematical Theory With Applications to Warfare and Pursuit, Control and Optimization. North Chelmsford: Courier Dover Publications, 1999.
    [3] Yan T, Cai Y, Bin X U. Evasion guidance algorithms for air-breathing hypersonic vehicles in three-player pursuit-evasion games. Chinese Journal of Aeronautics, 2020, 33(12): 3423−3436 doi: 10.1016/j.cja.2020.03.026
    [4] Karelahti J, Virtanen K, Raivio T. Near-optimal missile avoidance trajectories via receding horizon control. Journal of Guidance Control and Dynamics, 2015, 30(5): 1287−1298
    [5] Oyler D W, Kabamba P T, Girard A R. Pursuit-evasion games in the presence of obstacles. Automatica, 2016, 65: 1−11 doi: 10.1016/j.automatica.2015.11.018
    [6] Li W. The confinement-escape problem of a defender against an evader escaping from a circular region. IEEE Transactions on Cybernetics, 2016, 46(4): 1028−1039 doi: 10.1109/TCYB.2015.2503285
    [7] Sun Q L, Shen M H, Gu X L, Hou K, Qi N M. Evasion-pursuit strategy against defended aircraft based on differential game theory. International Journal of Aerospace Engineering, 2019 (2019): 1−12
    [8] Scott W L, Leonard N E. Optimal evasive strategies for multiple interacting agents with motion constraints. Automatica, 2018, 94: 26−34 doi: 10.1016/j.automatica.2018.04.008
    [9] 邵将, 徐扬, 罗德林. 无人机多机协同对抗决策研究. 信息与控制, 2018, 47(03): 347−354

    Shao Jiang, Xu Yang, Luo De-Lin. Cooperative combat decision-making research for multi UAVs. Information and Control, 2018, 47(03): 347−354
    [10] Virtanen K, Karelahti J, Raivio T. Modeling air combat by a moving horizon influence diagram game. Journal of Guidance Control and Dynamics, 2006, 29(5): 1080−1091 doi: 10.2514/1.17168
    [11] Feng C, Yao P. On close-range air combat based on hidden markov model. In: Proceeding of the 2016 IEEE Chinese Guidance, Navigation and Control Conference. Piscataway, USA: IEEE, 2016. 687−694
    [12] 冯超, 景小宁, 李秋妮, 姚鹏. 基于隐马尔科夫模型的空战决策点理论研究. 北京航空航天大学学报(自然科学版), 2017, 43(3): 615−626

    Feng Chao, Jing Xiao-Ning, Li Qiu-Ni, Yao Peng. Theoretical research of decision-making point in air combat based on hidden markov model. Journal of Beijing University of Aeronautics and Astronsutics (Natural Science Edition), 2017, 43(3): 615−626
    [13] 何旭, 景小宁, 冯超. 基于蒙特卡洛树搜索方法的空战机动决策. 空军工程大学学报(自然科学版), 2017, 18(5): 36−41

    He Xu, Jing Xiao-Ning, Feng Chao. Air combat maneuver decision based on MCTS method. Journal of Air Force Engineering University (Natural Science Edition), 2017, 18(5): 36−41
    [14] Nelson R L, Rafal Z. Effectiveness of autonomous decision making for unmanned combat aerial vehicles in dogfight engagements. Journal of Guidance Control and Dynamics, 2018, 41(4): 1021−1024 doi: 10.2514/1.G002937
    [15] 徐光大, 吕超, 王光辉, 谢宇鹏. 基于双矩阵对策的UCAV空战自主机动决策研究. 舰船电子工程, 2017, 37(11): 24−28 doi: 10.3969/j.issn.1672-9730.2017.11.007

    Xu Guang-Da, Lv Chao, Wang Guang-Hui, Xie Yu-Peng. Research on UCAV autonomous air combat maneuvering decision-making based on bi-matrix game. Ship Electronic Engineering, 2017, 37(11): 24−28 doi: 10.3969/j.issn.1672-9730.2017.11.007
    [16] Amnon K. Tree lookahead in air combat. Journal of Aircraft, 2015, 31(4): 970−973
    [17] Ma Y F, Ma X L, Song X, Fei M R. A case study on air combat decision using approximated dynamic programming. Mathematical Problems in Engineering, 2014 (2014): 183401
    [18] Chen M, Zhou Z Y, Tomlin C J. Multiplayer reach-avoid games via low dimensional solutions and maximum matching. In: Proceeding of the 2014 American Control Conference. Piscataway, USA: IEEE, 2014. 1443−1449
    [19] 欧建军, 张安. 不确定环境下协同空战目标分配模型. 火力与指挥控制, 2020, 45(5): 115−118 doi: 10.3969/j.issn.1002-0640.2020.05.021

    Ou Jian-Jun, Zhang An. Target distribution model in cooperative air combat under uncertain environment. Fire Control and Command Control, 2020, 45(5): 115−118 doi: 10.3969/j.issn.1002-0640.2020.05.021
    [20] 奚之飞, 徐安, 寇英信, 李战武, 杨爱武. 多机协同空战机动决策流程. 系统工程与电子技术, 2020, 42(2): 381−389 doi: 10.3969/j.issn.1001-506X.2020.02.17

    Xi Zhi-Fei, Xu An, Kou Ying-Xin, Li Zhan-Wu, Yang Ai-Wu. Decision process of multi-aircraft cooperative air combat maneuver. Systems Engineering and Electronics, 2020, 42(2): 381−389 doi: 10.3969/j.issn.1001-506X.2020.02.17
    [21] 韩统, 崔明朗, 张伟, 陈国明, 王骁飞. 多无人机协同空战机动决策. 兵器装备工程学报, 2020, 41(04): 117−123 doi: 10.11809/bqzbgcxb2020.04.023

    Han Tong, Cui Ming-Lang, Zhang Wei, Chen Guo-Ming, Wang Xiao-Fei. Multi-UCAV cooperative air combat maneuvering decision. Journal of Ordnance Equipment Engineering, 2020, 41(04): 117−123 doi: 10.11809/bqzbgcxb2020.04.023
    [22] 嵇慧明, 余敏建, 乔新航, 杨海燕, 张帅文. 改进BAS-TIMS算法在空战机动决策中的应用. 国防科技大学学报, 2020, 42(04): 123−133

    Ji Hui-Ming, Yu Min-Jian, Qiao Xin-Hang, Yang Hai-Yan, Zhang Shuai-Wen. Application of the improved BAS-TIMS algorithm in air combat maneuver decision. Journal of National University of Defense Technology, 2020, 42(04): 123−133
    [23] 王炫, 王维嘉, 宋科璞, 王敏文. 基于进化式专家系统树的无人机空战决策技术. 兵工自动化, 2019, 38(01): 42−47

    Wang Xuan, Wang Wei-Jia, Song Ke-Pu, Wang Min-Wen. UAV air combat decision based on evolutionary expert system tree. Ordnance Industry Automation, 2019, 38(01): 42−47
    [24] 周同乐, 陈谋, 朱荣刚, 贺建良. 基于狼群算法的多无人机协同多目标攻防满意决策方法. 指挥与控制学报, 2020, 6(03): 251−256 doi: 10.3969/j.issn.2096-0204.2020.03.0251

    Zhou Tong-Le, Chen Mou, Zhu Rong-Gang, He Jian-Liang. Attack-defense satisficing decision-making of multi-UAVs cooperative multiple targets based on WPS Algorithm. Journal of Command and Control, 2020, 6(03): 251−256 doi: 10.3969/j.issn.2096-0204.2020.03.0251
    [25] 左家亮, 杨任农, 张滢, 李中林, 邬蒙. 基于启发式强化学习的空战机动智能决策. 航空学报, 2017, 38(10): 217−230

    Zuo Jia-Liang, Yang Ren-Nong, Zhang Ying, Li Zhong-Lin, Wu Meng. Intelligent decision-making in air combat maneuvering based on heuristic reinforcement learning. Acta Aeronautica et Astronautica Sinica, 2017, 38(10): 217−230
    [26] 刘树林. 一种评价的新方法. 系统工程理论与实践, 1991, 11(4): 63−66

    Liu Shu-Lin. A new method of evaluation. Systems Engineering-Theory and Practice, 1991, 11(4): 63−66
    [27] Zhang H P, Huang C Q, Zhang Z R, Wang X F, Han B, Wei Z L, et al. The trajectory generation of UCAV evading missiles based on neural networks. Journal of Physics Conference Series, 2020, 1486(2020): 022025
    [28] Teng T H, Tan A H, Tan Y S, Yeo A. Self-organizing neural networks for learning air combat maneuvers. In: Proceeding of the 2012 International Joint Conference on Neural Networks. Piscataway, USA: IEEE, 2012. 2858−2866
    [29] 孟光磊, 马晓玉, 刘昕, 徐一民. 基于混合动态贝叶斯网的无人机空战态势评估. 指挥控制与仿真, 2017, 39(04): 1−6, 39 doi: 10.3969/j.issn.1673-3819.2017.04.001

    Meng Guang-Lei, Ma Xiao-Yu, Liu Xin, Xu Yi-Min. Situation assessment for unmanned aerial vehicles air combat based on hybrid dynamic Bayesian network. Command Control and Simulation, 2017, 39(04): 1−6, 39 doi: 10.3969/j.issn.1673-3819.2017.04.001
    [30] 杨爱武, 李战武, 徐安, 奚之飞, 常一哲. 基于加权动态云贝叶斯网络空战目标威胁评估. 飞行力学, 2020, 38(04): 87−94

    Yang Ai-Wu, Li Zhan-Wu, Xu An, Xi Zhi-Fei, Chang Yi-Zhe. Threat level assessment of the air combat target based on weighted cloud dynamic Bayesian network. Flight Dynamics, 2020, 38(04): 87−94
    [31] Yang Q, Zhang J, Shi G, Wu Y. Maneuver decision of UAV in short-range air combat based on deep reinforcement learning. IEEE Access, 2019, PP(99): 1−1
    [32] Liu P, Ma Y. A deep reinforcement learning based intelligent decision method for UCAV air combat. In: Proceeding of the 2017 Asian Simulation Conference. Berlin, Germany: Springer, 2017. 274−286
    [33] Zhou Y N, Ma Y F, Song X, Gong G H. Hierarchical fuzzy ART for Q-learning and its application in air combat simulation. International Journal of Modeling Simulation and Scientific Computing, 2017, 8(04): 1750052 doi: 10.1142/S1793962317500520
    [34] Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms [Online], available: https://arxiv.org/abs/1707.06347v2, August 28, 2017
    [35] Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529−533 doi: 10.1038/nature14236
    [36] Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, et al. Mastering the game of go without human knowledge. Nature, 2017, 550(7676): 354−359 doi: 10.1038/nature24270
    [37] Conde R, Llata J R, Torre-Ferrero C. Time-varying formation controllers for unmanned aerial vehicles using deep reinforcement learning [Online], available: https://arxiv.org/abs/1706.01384, June 5, 2017
    [38] Shalev-Shwartz S, Shammah S, Shashua A. Safe, multi-agent, reinforcement learning for autonomous driving [Online], available: https://arxiv.org/abs/1610.03295, October 11, 2016
    [39] Su P H, Gasic M, Mrksic N, Rojas-Barahona L, Ultes S, Vandyke D, et al. On-line active reward learning for policy optimization in spoken dialogue systems [Online], available: https://arxiv.org/abs/1605.07669v2, June 2, 2016
    [40] Schulman J, Levine S, Abbeel P, Jordan M, Moritz P. Trust region policy optimization [Online], available: https://arxiv.org/abs/1502.05477, April 20, 2017
  • 加载中
图(15) / 表(8)
计量
  • 文章访问数:  5006
  • HTML全文浏览量:  2205
  • PDF下载量:  1536
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-12-24
  • 网络出版日期:  2021-05-10
  • 刊出日期:  2021-07-27

目录

    /

    返回文章
    返回