• 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

深度强化学习驱动的超视距空战自主决策方法

吕茂隆 王金河 韩浩然 丁晨博 万路军

吕茂隆, 王金河, 韩浩然, 丁晨博, 万路军. 深度强化学习驱动的超视距空战自主决策方法. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250334
引用本文: 吕茂隆, 王金河, 韩浩然, 丁晨博, 万路军. 深度强化学习驱动的超视距空战自主决策方法. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250334
Lv Mao-Long, Wang Jin-He, Han Hao-Ran, Ding Chen-Bo, Wan Lu-Jun. The autonomous decision-making method for beyond visual range air combat driven by deep reinforcement learning. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250334
Citation: Lv Mao-Long, Wang Jin-He, Han Hao-Ran, Ding Chen-Bo, Wan Lu-Jun. The autonomous decision-making method for beyond visual range air combat driven by deep reinforcement learning. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250334

深度强化学习驱动的超视距空战自主决策方法

doi: 10.16383/j.aas.c250334 cstr: 32138.14.j.aas.c250334
基金项目: 国家自然科学基金(62303489, GKJJ24050502), 博士后面上基金(2022M723877), 博士后特别资助(2023T160790), 中国博士后国际交流引进计划(YJ20220347), 军事科技领域青年人才托举工程(2022-JCJQ-QT-018), 陕西省自然科学基础研究计划重点项目(2025JC-QYCX-052)资助
详细信息
    作者简介:

    吕茂隆:国家级青年人才, 空军工程大学副教授, 荷兰代尔夫特理工大学博士. 主要研究方向为集群无人机协同打击、有人−无人协同空战、智能空战等领域. 本文通信作者. E-mail: maolonglv@163.com

    王金河:空军工程大学硕士研究生. 主要研究方向为智能空战. E-mail: goldenriver2025@163.com

    韩浩然:电子科技大学博士研究生. 主要研究方向为强化学习技术与应用.E-mail: hanadam@163.com

    丁晨博:空军工程大学博士研究生. 主要研究方向为有人-无人协同空战. E-mail: chenbo_ding2024@163.com

    万路军:空军工程大学副教授. 主要研究方向为智能空域管理、空域冲突检测与冲突解除、空间网格管理. E-mail: pandawlj@126.com

The Autonomous Decision-making Method for Beyond Visual Range Air Combat Driven by Deep Reinforcement Learning

Funds: Supported by National Natural Science Foundation of China (62303489, GKJJ24050502), Postdoctoral General Fund (2022M723877), Special Postdoctoral Funding (2023T160790), China Postdoctoral International Exchange and Introduction Program (YJ20220347), Youth Talent Support Program for Military Science and Technology (2022-JCJQ-QT-018), Key Project of the Natural Science Basic Research Program of Shaanxi Province (2025JC-QYCX-052)
More Information
    Author Bio:

    LV Mao-Long Associate Professor, Air Force Engineering University, National-Level Young Talent; Ph. D., Delft University of Technology, the Netherlands. His main research interests include cooperative strike by swarming unmanned aerial vehicles, manned-unmanned collaborative air combat, and intelligent air combat. Corresponding author of this paper

    WANG Jin-He Master’s student, Air Force Engineering University. His main research area is intelligent air combat

    HAN Hao-Ran Ph. D. candidate, University of Electronic Science and Technology. His main research interest is reinforcement learning techniques and applications

    DING Chen-Bo Ph. D. candidate, Air Force Engineering University. His main research interests focuses on man-unmanned cooperative air combat

    WAN Lu-Jun Associate Professor, Air Force Engineering University. His main research interests include intelligent airspace management, airspace conflict detection and deconfliction, and spatial grid management

  • 摘要: 随着机载传感器和中远距空空导弹技术的快速发展, 超视距空战已经成为现代空战的主流形式. 在这种复杂多变的作战环境中, 开发能够实时掌握战场态势并制定合理机动决策的智能化技术, 已成为军事技术研究领域的热点问题. 首先, 构建一个涵盖飞机六自由度动力学模型、导弹制导系统模型和雷达传感器系统的高保真仿真环境. 接着, 融合模仿学习和自博弈方法, 提出基于对手学习的空战决策框架, 以解决深度强化学习在空战中适应性和泛化性差的缺点, 提升智能体在复杂多变战场环境中快速适应和策略优化能力. 最后, 构建具有战术差异性的专家系统, 在高保真空战仿真平台中与智能体进行博弈对抗. 结果表明, 在收敛速度和胜率等关键指标上, 所提出的空战决策框架优于传统深度强化学习决策策略, 有效性和泛化性强, 可为复杂超视距空战态势下快速生成可靠策略提供技术支持.
    1)  11 MAR: Minimum Abort Range. 2 MOR: Minimum Out Range. 3 TR: Target Range. 4 CR: Commit Range. 5PR: Picture Range.
  • 图  1  比例导引法

    Fig.  1  Method of proportional navigation guidance

    图  2  雷达传感器作用距离

    Fig.  2  The operating range of radar sensors

    图  3  混合动作空间示意图

    Fig.  3  Illustration of a hybrid action space

    图  4  超视距空战时间线示意图

    Fig.  4  A timeline diagram of BVR

    图  5  战术规则状态转换图

    Fig.  5  Transition diagram of tactical rule state

    图  6  对手学习决策框架图

    Fig.  6  Framework diagram of adversarial learning decision

    图  7  仿真环境实验框架

    Fig.  7  Experiment framework of Simulation Environment

    图  8  训练结果对比图

    Fig.  8  The comparison chart of training result

    图  9  对抗胜率表

    Fig.  9  Table of win rates in confrontation

    图  10  基于对手学习的对抗过程

    Fig.  10  The adversarial process based on adversarial learning

    表  1  雷达传感器参数

    Table  1  Parameters of radar sensors

    参数名称 机载探测雷达
    系统
    机载火控雷达
    系统
    雷达告警
    系统
    飞机发射功率(kw) 30 22 \
    导弹发射功率(kw) \ \ 1
    双方雷达反射截面积($ m^2 $) 4 4 \
    发射天线增益(dBi) 42 38 20
    接收天线增益(dBi) 42 38 30
    信号波长($ m $) 0.037 0.032 0.024
    方位角(°) [−120, 120] [−60, 60] [−180, 180]
    俯仰角(°) [−60, 60] [−15, 15] [−90, 90]
    下载: 导出CSV

    表  2  战术指令层动作集

    Table  2  Tactical command layer action set

    序号战术指令机动参数
    1搜索四种基本参数、探测雷达开机参数
    2锁定四种基本参数、火控雷达开机参数
    3攻击四种基本参数、发射导弹参数
    4规避四种基本参数、告警雷达开机参数
    5脱离四种基本参数
    下载: 导出CSV

    表  3  奖励事件及取值设计

    Table  3  Reward event and value design

    类型 名称 数值
    关键事件奖励 击落 50
    平局 −20
    被击落 −50
    锁定 10
    被锁定 −10
    发射导弹 −5
    状态奖励 危险飞行 $ R_{\rm{df}} $
    威胁 $ R_{\rm{th}} $
    优势 $ R_{\rm{ad}} $
    时间步 $ R_{\rm{ts}} $
    下载: 导出CSV

    表  4  超视距空战时间线

    Table  4  Timeline of BVR

    名称 符号 描述 关键距离
    最小中断距离 MAR 1 执行Short Skate机动 30km
    最小回转距离 MOR 1 执行Skate机动 45km
    分配距离 TR 1 进行目标分配 65km
    前出距离 CR 1 前出发起攻击 80km
    画面距离 PR 1 获取战场态势信息 120km
    下载: 导出CSV

    表  5  基于规则的超视距空战决策框架典型态势

    Table  5  Typical situation of Rule-based BVR decision-making framework

    态势$ S $描述战术分类
    $ s_1 $在PR外前出基础状态
    $ s_2 $在TR$ \sim $PR之间前出进攻性
    $ s_3 $在MAR$ \sim $TR之间前出进攻性+防御性
    $ s_4 $返航基础状态
    $ s_5 $任务失败基础状态
    下载: 导出CSV

    表  6  基于规则的超视距空战决策框架典型事件

    Table  6  Typical events of Rule-based BVR decision-making framework

    事件$ E $描述战术分类
    $ e_1 $目标雷达开机基础状态
    $ e_2 $目标进入武器攻击区进攻性
    $ e_3 $锁定目标进攻性
    $ e_4 $被目标锁定防御性
    $ e_5 $击杀目标基础状态
    $ e_6 $被目标击杀基础状态
    $ e_7 $目标逃逸基础状态
    下载: 导出CSV

    表  7  基于规则的超视距空战决策框架典型条件

    Table  7  Typical conditions of Rule-based BVR decision-making framework

    条件$ C $描述战术分类
    $ c_1 $在MAR$ \sim $TR之间内进攻性
    $ c_2 $导弹剩余数量大于0进攻性
    $ c_3 $在MOR外被攻击进攻性
    $ c_4 $在MAR内被攻击防御性
    下载: 导出CSV

    表  8  基于规则的超视距空战决策框架基础机动动作

    Table  8  Basic maneuvers of Rule-based BVR decision-making framework

    动作$ A $动作名称描述战术分类
    $ a_1 $匀速前飞适用于巡航和搜索阶段基础状态
    $ a_2 $爬升提升高度以获得高度优势基础状态
    $ a_3 $俯冲降低高度以获得速度优势基础状态
    $ a_4 $置尾机动180°转弯机动防御性
    $ a_5 $Skate机动在MOR外发射后脱离进攻性
    $ a_6 $Short Skate机动在MAR外发射后脱离进攻性
    $ a_7 $发射发射导弹进攻性
    下载: 导出CSV

    表  9  基于规则的超视距空战决策框架规则集

    Table  9  Rule set of Rule-based BVR decision-making framework

    规则
    编号
    当前
    状态S
    触发
    事件E
    满足
    条件C
    执行
    动作A
    下一
    状态S
    战术类别
    1 $ s_1 $ $ a_1 $ $ s_1 $ 基础状态
    2 $ s_1 $ $ a_1 $ $ s_2 $ 基础状态
    3 $ s_1 $ $ e_7 $ $ s_4 $ 基础状态
    4 $ s_2 $ $ e_7 $ $ s_4 $ 基础状态
    5 $ s_2 $ $ e_1 $ $ a_1 $ $ s_2 $ 进攻性
    6 $ s_2 $ $ e_3 $ $ a_1 $ $ s_3 $ 进攻性
    7 $ s_3 $ $ e_2 $ $ c_1 $ $ a_1 $ $ s_3 $ 进攻性
    8 $ s_3 $ $ e_2 $ $ c_1 $ $ a_2 $ $ s_3 $ 进攻性
    9 $ s_3 $ $ e_3 $ $ {c_1} \cap {c_2} $ $ a_7 $ $ s_3 $ 进攻性
    10 $ s_3 $ $ e_3 $ $ \neg {c_2} $ $ a_7 $ $ s_4 $ 防御性
    11 $ s_3 $ $ e_5 $ $ a_4 $ $ s_4 $ 基础状态
    12 $ s_3 $ $ e_4 $ $ c_3 $ $ a_5 $ $ s_3 $ 防御性+进攻性
    13 $ s_3 $ $ e_4 $ $ c_4 $ $ a_6 $ $ s_3 $ 防御性+进攻性
    14 $ s_3 $ $ e_4 $ $ c_3 $ $ a_3 $ $ s_3 $ 防御性
    15 $ s_3 $ $ e_4 $ $ c_4 $ $ a_4 $ $ s_3 $ 防御性
    16 $ s_3 $ $ e_6 $ $ s_5 $ 基础状态
    下载: 导出CSV

    表  10  10种空战专家系统设计

    Table  10  Ten designs of expert system for air combat

    类型 名称 功能特点
    偏好攻击型 对手1 首轮双弹打击
    对手2 高速接近突袭
    对手3 保持高位压制
    攻防均衡型 对手4 能量−射程均衡
    对手5 雷达扫描−锁定优化
    对手6 高度−速度攻防转换
    对手7 多弹压制−复合规避
    偏好防御型 对手8 TR巡逻规避
    对手9 MOR巡逻规避
    对手10 持续转冷规避
    下载: 导出CSV
  • [1] 环球时报. 印巴冲突折射未来战争特征[Online], available: https://www.news.cn/milpro/20250512/904cb51918044b828f16d82da7964ac7/c.html, 2025-09-11.

    Global Times. The India-Pakistan conflict reflects the characteristics of future warfare[Online], available: https://www.news.cn/milpro/20250512/904cb51918044b828f16d82da7964ac7/c.html, September 11, 2025
    [2] 孙智孝, 杨晟琦, 朴海音, 白成超, 葛俊. 未来智能空战发展综述. 航空学报, 2021, 42(08): 35−49

    Sun Zhi-Xiao, Yang Sheng-Qi, Piao Hai-Yin, Bai Cheng-Chao, Ge Jun. A survey of air combat artificial intelligence. Acta Aeronautica et Astronautica Sinica, 2021, 42(08): 35−49
    [3] 任好, 马亚杰, 姜斌, 刘成瑞. 基于零和微分博弈的航天器编队通信链路故障容错控制. 自动化学报, 2025, 51(01): 174−185

    Ren Hao, Ma Ya-Jie, Jiang Bin, Liu Chen-Rui. Research on application of differential game in attack and defense of UAV swarms. Acta Automatica Sinica, 2025, 51(01): 174−185
    [4] Herrala O, Terho T, Oliveira F. Risk-averse decision strategies for influence diagrams using rooted junction trees. Operations Research Letters, DOI: 10.1016/J.ORL.2025.107308
    [5] 史桐雨, 王昊, 王酉琨, 吕茂隆. 无人作战飞机超视距空战博弈对抗决策仿真. 现代防御技术, 20251−11

    Shi Tong-Yu, Wang Hao, Wang You-Kun, Lv Mao-Long. Simulation of Game-Theoretic Decision-Making for Beyond-Visual-Range Combat with UCAVs. Modern Defence Technology, 20251−11
    [6] 吕茂隆, 丁晨博, 韩浩然, 段海滨. 基于深度强化学习的无人机自主感知-规划-控制策略. 自动化学报, 2025, 51(05): 1092-1106

    Lv Mao-Long, Ding Chen-Bo, Han Hao-Ran, Duan Hai-Bin: Autonomous perception-planning-control strategy based on deep reinforcement learning for unmanned aerial vehicles, vol. 51, no. 12, pp. 1092-1106(2025) (in Chinese). doi: 10.16383/j.aas.c240639 Acta Automatica Sinica, 2025, 51(05): 1092-1106
    [7] 施伟, 冯旸赫, 程光权, 黄红蓝, 黄金才, 刘忠, 等. 基于深度强化学习的多机协同空战方法研究. 自动化学报, 2021, 47(07): 1610−1623 doi: 10.16383/j.aas.c201059

    Shi Wei, Feng Yang-He, Cheng Guang-Quan, Huang Hong-Lan, Huang Jin-Cai, Liu Zhong, et al. Survey on multi-agent reinforcement learning for control and decision-making. Acta Automatica Sinica, 2021, 47(07): 1610−1623 doi: 10.16383/j.aas.c201059
    [8] 罗彪, 胡天萌, 周育豪, 黄廷文, 阳春华, 桂卫华. 多智能体强化学习控制与决策研究综述. 自动化学报, 2025, 51(03): 510−539

    Luo Bin, Hu Tian-Meng, Zhou Yu-Hao, Huang Ting-Wen, Yang Chun-Hua, Gui Wei-Hua. Survey on multi-agent reinforcement learning for control and decision-making. Acta Automatica Sinica, 2025, 51(03): 510−539
    [9] 郭万春, 解武杰, 尹晖, 董文瀚. 基于改进双延迟深度确定性策略梯度法的无人机反追击机动决策. 空军工程大学学报(自然科学版), 2021, 22(04): 15−21 doi: 10.3969/j.issn.1009-3516.2021.04.003

    Guo Wan-Chun, Xie Wu-Jie, Yin Hui, Dong Wen-Han. Research on UAV anti-pursing maneuvering decision based on improved twin delayed deep deterministic policy gradient method. Journal of Air Force Engineering University (Natural Science Edition), 2021, 22(04): 15−21 doi: 10.3969/j.issn.1009-3516.2021.04.003
    [10] 孙世彬, 王庆领. 基于改进多智能体强化学习的大规模无人机集群博弈对抗. 海军航空大学学报, 2025, 40(04): 528−538

    Sun Shi-Bin, Wang Qing-Ling. Large scale UAV cluster game confrontation based on improved multi-agent reinforcement learning. Journal of Naval Aviation University, 2025, 40(04): 528−538
    [11] 欧洋, 徐扬, 张金鹏, 罗德林. 无人机空战的竞争与双重深度强化学习机动对抗决策. 厦门大学学报(自然科学版), 2022, 61(06): 975−985 doi: 10.6043/j.issn.0438-0479.202204056

    Ou Yang, Xu Yang, Zhang Jin-Peng, Luo De-Lin. UAV air combat dueling and double deep reinforcement learning maneuver adversarial decision making. Journal of Xiamen University (Natural Science), 2022, 61(06): 975−985 doi: 10.6043/j.issn.0438-0479.202204056
    [12] 吴宜珈, 赖俊, 陈希亮, 曹雷, 徐鹏. 强化学习算法在超视距空战辅助决策上的应用研究. 航空兵器, 2021, 28(02): 55−61

    Wu Yi-Jia, Lai Jun, Chen Xi-Liang, Cao Lei, Xu Peng. Research on the application of reinforcement learning algorithm in decision support of beyond-visual-range air combat. Aero Weaponry, 2021, 28(02): 55−61
    [13] 孔维仁, 周德云, 赵艺阳, 杨婉莎. 基于深度强化学习与自学习的多无人机近距空战机动策略生成算法. 控制理论与应用, 2022, 39(02): 352−362

    Kong Wei-ren, Zhou De-yun, Zhao Yi-yang, Yang Wan-sha. Maneuvering strategy generation algorithm for multi-UAV in close-range air combat based on deep reinforcement learning and self-play. Control Theory & Applications, 2022, 39(02): 352−362
    [14] 王臆淞, 张鹏翼, 顾启佳, 赵铭慧, 张雪波. ASM2:面向海空联合场景的多对 手多智能体博弈算法. 控制理论与应用, 2025, 42(07): 1275−1284 doi: 10.7641/CTA.2024.30220

    Wang Yi-Song, Zhang Peng-Yi, Gu Qi-Jia, Zhao Ming-Hui, Zhang Xue-Bo. ASM2: multi-agent multi-opponent game algorithm for joint sea-air scenarios. Control Theory & Applications, 2025, 42(07): 1275−1284 doi: 10.7641/CTA.2024.30220
    [15] Piao H Y, Sun Z X, Meng G L, Chen H C, Qu B H, Lang K J. Beyond-visual-range air combat tactics auto-generation by reinforcement learning. In: 2020 International Joint Conference on Neural Networks (IJCNN). Glasgow, UK: IEEE, 2020. 1-8
    [16] 单圣哲, 张伟伟. 基于自博弈深度强化学习的空战智能决策方法. 航空学报, 2024, 45(04): 200−212

    Shan Sheng-Zhe, Zhang Wei-Wei. Air combat intelligent decision-making method based on self-play and deep reinforcement learning. Acta Aeronautica et Astronautica Sinica, 2024, 45(04): 200−212
    [17] 周攀, 黄江涛, 章胜, 刘刚, 舒博文, 唐骥罡. 基于深度强化学习的智能空战决策与仿真. 航空学报, 2023, 44(04): 99−112

    Zhou Pan, Huang Jiang-Tao, Zhang Sheng, Liu Gang, Shu Bo-Wen, Tang Ji-Gang. Intelligent air combat decision making and simulation based on deep reinforcement learning. Acta Aeronautica et Astronautica Sinica, 2023, 44(04): 99−112
    [18] 李银通, 韩统, 孙楚, 魏政磊. 基于逆强化学习的空战态势评估函数优化方法. 火力与指挥控制, 2019, 44(08): 101−106

    Li Yin-Tong, Han Tong, Sun Chu, Wei Zheng-Lei. An optimization method of air combat situation assessment function based on inverse reinforcement learning. Fire Control & Command Control, 2019, 44(08): 101−106
    [19] Shi Y Y, Li J, Lv M L, Wang N, Zhang B Y. Distributed consensus control for 6-DOF fixed-wing multi-UAVs in asynchronously switching topologies, IEEE Transactions on Vehicular Technology, DOI: 10.1109/TVT.2024.3520141
    [20] 梁玉峰, 赵景朝, 刘旺魁, 王雷, 王世鹏, 阮仕龙. 基于顶层滚动优化和底层跟踪的空战导引方法. 系统工程与电子技术, 2023, 45(09): 2866−2872 doi: 10.12305/j.issn.1001-506X.2023.09.26

    Liang Yu-Feng, Zhao Jing-Chao, Liu Wang-Kui, Wang Lei, Wang Shi-Peng, Ruan Shi-Long. Air combat guidance method based on top rolling optimization and bottom tracking. Systems Engineering and Electronics, 2023, 45(09): 2866−2872 doi: 10.12305/j.issn.1001-506X.2023.09.26
    [21] Wang W F, Ru L, Lv M L, Hou Y Q, Yin H. Exploring hierarchical hybrid autonomous maneuvering decision-making architecture in Beyond Visual Range air combat. IEEE Transactions on Vehicular Technology, DOI: 10.1109/TVT.2025.3568377
    [22] Wang W F, Ru L, Lv M L, Mo L. Dynamic and adaptive learning for autonomous decision-making in Beyond Visual Range air combat. Aerospace Science and Technology, DOI: 10.1016/J.AST.2025.110327
    [23] Xu Y H, Wei Y R, Jiang K Y, Chen L, Wang D, Deng H B. Action decoupled SAC reinforcement learning with discrete-continuous hybrid action spaces. Neurocomputing, DOI: 10.1016/j.neucom.2023.03.054
    [24] Han H R, Cheng J, Lv M L, Duan H B. Augmenting the robustness of tactical maneuver decision-making in unmanned aerial combat vehicles during dogfights via prioritized population play with diversified partners. IEEE Transactions on Aerospace and Electronic Systems, DOI: 10.1109/TAES.2025.3578294
    [25] Jiang C R, Wang H, Ai J L. Autonomous maneuver decision-making algorithm for UCAV based on generative adversarial imitation learning. Aerospace Science and Technology, DOI: 10.1016/j.ast.2025.110313
    [26] Hou Y Q, Liang X L, Zhang J Q, Lv M L, Yang A W. Hierarchical decision-making framework for multiple UCAVs autonomous confrontation. IEEE Transactions on Vehicular Technology, DOI: 10.1109/TVT.2023.3285223
    [27] Chen C, Song T, Mo L, Lv M L, Lin D F. Autonomous dogfight decision-making for air combat based on reinforcement learning with automatic opponent sampling. Aerospace, DOI: 10.3390/aerospace12030265
    [28] 陈灿, 莫雳, 郑多, 程子恒, 林德福. 非对称机动能力多无人机智能协同攻防对抗. 航空学报, 2020, 41(12): 342−354

    Chen Can, Mo Li, Zheng Duo, Cheng Zi-Heng, Lin De-Fu. Cooperative attack-defense game of multiple UAVs with asymmetric maneuverability. Acta Aeronautica et Astronautica Sinica, 2020, 41(12): 342−354
    [29] Chen C, Song T, Mo L, Lv M L, Yu Y N. Scalable cooperative decision-making in multi-UAV confrontations: an attention-based multi-agent actor-critic approach. IEEE Transactions on Aerospace and Electronic Systems, DOI: 10.1109/TAES.2025.3571405
  • 加载中
计量
  • 文章访问数:  4
  • HTML全文浏览量:  5
  • 被引次数: 0
出版历程
  • 网络出版日期:  2026-02-13

目录

    /

    返回文章
    返回