• 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

融合进化算法和深度强化学习的飞行器制导控制一体化

陈建国 姚蔚然 孙光辉 吴立刚

陈建国, 姚蔚然, 孙光辉, 吴立刚. 融合进化算法和深度强化学习的飞行器制导控制一体化. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250278
引用本文: 陈建国, 姚蔚然, 孙光辉, 吴立刚. 融合进化算法和深度强化学习的飞行器制导控制一体化. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250278
Chen Jian-Guo, Yao Wei-Ran, Sun Guang-Hui, Wu Li-Gang. Integrated guidance and control of flight vehicles by fusing evolutionary algorithms and deep reinforcement learning. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250278
Citation: Chen Jian-Guo, Yao Wei-Ran, Sun Guang-Hui, Wu Li-Gang. Integrated guidance and control of flight vehicles by fusing evolutionary algorithms and deep reinforcement learning. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250278

融合进化算法和深度强化学习的飞行器制导控制一体化

doi: 10.16383/j.aas.c250278 cstr: 32138.14.j.aas.c250278
基金项目: 国家自然科学基金(62033005, 62473109), 黑龙江省龙江科技英才春雁支持计划(CYQN24036)资助
详细信息
    作者简介:

    陈建国:哈尔滨工业大学航天学院博士研究生. 主要研究方向为无人飞行器自主导航与控制. E-mail: chenjianguo_hit@163.com

    姚蔚然:哈尔滨工业大学航天学院教授. 主要研究方向为多机器人任务规划、无人系统智能控制. E-mail: yaoweiran@hit.edu.cn

    孙光辉:哈尔滨工业大学航天学院教授. 主要研究方向为无人系统智能控制. E-mail: guanghuisun@hit.edu.cn

    吴立刚:哈尔滨工业大学航天学院教授. 主要研究方向为切换随机系统, 滑模控制, 自主智能无人系统. 本文通信作者. E-mail: ligangwu@hit.edu.cn

Integrated Guidance and Control of Flight Vehicles by Fusing Evolutionary Algorithms and Deep Reinforcement Learning

Funds: Supported by National Natural Science Foundation of China (62033005, 62473109), and the Heilongjiang Provincial Science and Technology Talent Support Program (CYQN24036)
More Information
    Author Bio:

    CHEN Jian-Guo Ph.D. candidate at the School of Astronautics, Harbin Institute of Technology. His main researchinterest is autonomous navigation and control of unmanned aerial vehicles

    YAO Wei-Ran Professor at the School of Astronautics, Harbin Institute of Technology. His interests include multi robot task planning and intelligent control of unmanned systems

    SUN Guang-Hui Professor at the School of Astronautics, Harbin Institute of Technology. His main research interest is intelligent control of unmanned systems

    WU Li-Gang Professor at the School of Astronautics, Harbin Institute of Technology. His research interests include switched stochastic systems, sliding mode control, and autonomous intelligent unmanned systems. Corresponding author of this paper

  • 摘要: 针对高超声速飞行器在外界干扰与模型不确定性影响下的制导控制难题, 提出一种融合双延迟深度确定性策略梯度与交叉熵方法 (cross-entropy method, CEM) 的进化强化学习框架. 首先, 构建高超声速飞行器的运动模型与制导控制一体化模型; 其次, 将复杂干扰环境下的多约束控制问题转化为强化学习决策优化过程, 依托深度强化学习的无模型数据驱动特性, 建立从状态观测到舵偏角指令的端到端映射机制. 同时, 引入基于CEM的动作空间采样机制, 通过Q值最大化准则筛选精英候选动作集, 利用价值函数引导进化搜索方向, 有效克服传统强化学习探索低效、盲目性强的缺陷, 提升样本利用效率. 最后, 仿真结果表明所提算法能够适应初始高度偏差±300 m、速度偏差±200 m/s及气动参数±40%不确定性等变任务飞行条件, 且在终端控制精度与鲁棒性等核心指标上显著优于传统控制方法.
  • 图  1  基于CEM-TD3算法的离线训练流程示意图

    Fig.  1  Schematic diagram of offline training process based on CEM-TD3 algorithm

    图  2  在线部署阶段的端到端制导控制一体化框图

    Fig.  2  Block diagram of end-to-end integrated guidance and control in the online deployment phase

    图  3  奖励与损失的训练结果曲线

    Fig.  3  Training result curves of reward and loss

    图  4  基本性能测试的位置跟踪曲线

    Fig.  4  Position tracking curve of basic performance test

    图  5  基本性能测试的姿态角变化曲线

    Fig.  5  Attitude angle variation curve of basic performance test

    图  6  基本性能测试的角速度变化曲线

    Fig.  6  Angular velocity variation curve of basic performance test

    图  7  基本性能测试的舵偏角变化曲线

    Fig.  7  Rudder deflection angle variation curve of basic performance test

    图  8  鲁棒性能测试的Monte Carlo仿真结果1

    Fig.  8  Monte Carlo simulation result 1 of robust performance test

    图  9  鲁棒性能测试的Monte Carlo仿真结果2

    Fig.  9  Monte Carlo simulation result 2 of robust performance test

    图  10  鲁棒性能测试的航迹角跟踪误差统计结果对比

    Fig.  10  Comparison of statistical results of flight path angle tracking errors in robust performance test

    图  11  鲁棒性能测试脱靶量直方图对比

    Fig.  11  Comparison of miss distance histograms in robust performance test

    图  12  泛化性能测试结果

    Fig.  12  Generalization performance test results

    表  1  神经网络结构参数

    Table  1  Parameters of Neural Network Structure

    网络层 策略网络 价值网络
    神经元数 激活函数 神经元数 激活函数
    输入层 18 None 21 None
    隐藏层1 256 ReLU 256 ReLU
    隐藏层2 256 ReLU 256 ReLU
    输出层 3 Tanh 1 Linear
    下载: 导出CSV

    表  2  训练超参数

    Table  2  Training Hyperparameters

    训练超参数数值训练超参数数值
    训练回合数$ M $500经验回放池容量$ N_{\text{max}} $$ 10^{6} $
    单回合最大时间步$ T $/策略网络噪声标准差$\varepsilon_1$0.1
    批学习数$ N_{\text{b}} $128目标网络噪声标准差$\varepsilon_2$0.2
    折扣因子$ \gamma $0.99价值网络1学习率$ \alpha_1 $$ 5.0 \times 10^{-4} $
    策略网络学习率$ \beta $$ 1.0 \times 10^{-4} $价值网络2学习率$ \alpha_2 $$ 5.0 \times 10^{-4} $
    目标网络更新系数$ \tau $$ 5.0 \times 10^{-3} $延迟策略更新$ k $2
    下载: 导出CSV

    表  3  模拟外部干扰的参数域设置

    Table  3  Parameter domain settings for simulated external disturbances

    参数项单位参数域
    干扰力矩$ \Delta d_{x} $N·m$ \tilde{D}_{x} (-\cos (\pi t/40)+\sin (\pi t/20)) $
    $\tilde{D}_{x} \in [40,\;60]$
    干扰力矩$ \Delta d_{y} $N·m$ \tilde{D}_{y} (-\cos (\pi t/30)+\sin (\pi t/60)) $
    $\tilde{D}_{y} \in [400,\;600]$
    干扰力矩$ \Delta d_{z} $N·m$ \tilde{D}_{z} \cos (\pi t/30) \sin (\pi t/20) $
    $\tilde{D}_{z} \in [800,\;1200]$
    注: $ \tilde{*} $表示参数“$ * $”从参数域中随机选择.
    下载: 导出CSV

    表  4  每回合初始状态量的参数域设置

    Table  4  Parameter domain settings for initial state variables in each episode

    状态量单位数学形式偏差带$ 3\sigma $
    高度$\tilde{H}_0$m$34000 + \mathrm{N}(0,\; \sigma_H^2)$300
    速度$\tilde{V}_0$m/s$3700 + \mathrm{N}(0,\; \sigma_V^2)$200
    经度$\tilde{\lambda}_0$(°)$120 + \mathrm{N}(0,\; \sigma_\lambda^2)$0.1
    纬度$\tilde{\phi}_0$(°)$25 + \mathrm{N}(0,\; \sigma_\phi^2)$0.1
    轨迹倾角$\tilde{\theta}_0$(°)$\theta_0 + \mathrm{N}(0,\; \sigma_\theta^2)$1.0
    轨迹偏角$\tilde{\psi}_{v_0}$(°)$\psi_{v_0} + \mathrm{N}(0,\; \sigma_{\psi_v}^2)$1.0
    下载: 导出CSV

    表  5  模拟气动参数不确定性的参数域设置

    Table  5  Parameter domain settings for simulated aerodynamic parameter uncertainties

    参数项数学形式偏差带$3\sigma$
    气动力系数偏差$ \Delta \tilde{C}_{L},\; \Delta \tilde{C}_{D},\; \Delta \tilde{C}_{Y}$$\mathrm{N}(0,\; \sigma^2)$40%
    气动力矩系数偏差$ \Delta \tilde{C}_{mx},\;\Delta \tilde{C}_{my},\;\Delta \tilde{C}_{mz} $$\mathrm{N}(0,\; \sigma^2)$40%
    大气密度$ \tilde{\rho}_A $$\mathrm{N}(0,\; \sigma^2)$40%
    下载: 导出CSV
  • [1] 张绍芳, 叶蕾. 国外高超声速飞行器及技术发展综述. 中国航天, 2016, 12: 16−20 doi: 10.3969/j.issn.1002-7742.2016.12.008

    Zhang Shao-Fang, Ye Lei. Summary of foreign hypersonic vehicles and technology development. China Aerospace, 2016, 12: 16−20 doi: 10.3969/j.issn.1002-7742.2016.12.008
    [2] 杨浩东, 王剑颖, 吴志刚, 刘佳琪, 梁海朝. 面向动态禁飞区的自适应触角探测机动制导方法. 宇航学报, 2024, 45(2): 192−202 doi: 10.3873/j.issn.1000-1328.2024.02.004

    Yang Hao-Dong, Wang Jian-Ying, Wu Zhi-Gang, Liu Jia-Qi, Liang Hai-Chao. Adaptive antenna detection maneuver guidance method for dynamic no-fly zone. Astronomical Journal, 2024, 45(2): 192−202 doi: 10.3873/j.issn.1000-1328.2024.02.004
    [3] 顾攀飞, 齐瑞云, 郭小平. 高超声速飞行器再入自适应容错制导控制一体化设计. 南京航空航天大学学报, 2018, 50(6): 763−775 doi: 10.16356/j.1005-2615.2018.06.005

    Gu Pan-Fei, Qi Rui-Yun, Guo Xiao-Ping. Integrated design of reentry adaptive fault-tolerant guidance and control for hypersonic vehicle. Journal of Nanjing University of Aeronautics and Astronautics, 2018, 50(6): 763−775 doi: 10.16356/j.1005-2615.2018.06.005
    [4] Song H T, Zhang T. Fast robust integrated guidance and control design of interceptors. IEEE Transactions on Control Systems Technology, 2015, 24(1): 349−356
    [5] Wang W H, Xiong S F, Wang S, Song S Y, Lai C. Three dimensional impact angle constrained integrated guidance and control for missiles with input saturation and actuator failure. Aerospace Science and Technology, 2016, 53: 169−187 doi: 10.1016/j.ast.2016.03.015
    [6] Chang J, Guo Z Y, Cieslak J, Chen W S. Integrated guidance and control design for the hypersonic interceptor based on adaptive incremental backstepping technique. Aerospace Science and Technology, 2019, 89: 318−332 doi: 10.1016/j.ast.2019.03.058
    [7] Xu B. Robust adaptive neural control of flexible hypersonic flight vehicle with dead-zone input nonlinearity. Nonlinear Dynamics, 2015, 80: 1509−1520 doi: 10.1007/s11071-015-1958-8
    [8] Luo Y X, Song J, Zhao M F, Li W L, Wei M J. Integrated guidance and control for hypersonic vehicle with disturbance and measurement noise suppression. IEEE Transactions on Aerospace and Electronic Systems, 2024, 60(5): 7172−7184 doi: 10.1109/TAES.2024.3412070
    [9] Chong Z Y, Guo J G, Zhao B, Guo Z, Lu X D. Finite-time integrated guidance and control system for hypersonic vehicles. Transactions of the Institute of Measurement and Control, 2021, 43(4): 842−853 doi: 10.1177/0142331220941934
    [10] 何昊, 王鹏. 高速变形飞行器制导控制一体化设计方法. 航空学报, 2024, 45(S1): 299−312

    He Hao, Wang Peng. Integrated design method of guidance and control for high-speed deformable aircraft. Acta Aeronautica et Astronautica Sinica, 2024, 45(S1): 299−312
    [11] 赖超, 王卫红, 熊少锋. 拦截大机动目标的三维制导控制一体化设计. 宇航学报, 2017, 38(7): 714−722 doi: 10.3873/j.issn.1000-1328.2017.07.006

    Lai Chao, Wang Wei-Hong, Xiong Shao-Feng. Integrated design of three-dimensional guidance and control for intercepting highly maneuvering targets. Journal of Astronautics, 2017, 38(7): 714−722 doi: 10.3873/j.issn.1000-1328.2017.07.006
    [12] Kim B S, Calise A J, Sattigeri R J. Adaptive, integrated guidance and control design for line-of-sight-based formation flight. Journal of Guidance, Control, and Dynamics, 2007, 30(5): 1386−1399 doi: 10.2514/1.27758
    [13] Wang Y L, Wang C C, Han Q L, Wang X F. Networked and deep reinforcement learning-based control for autonomous marine vehicles: A survey. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2024, 55(1): 4−17
    [14] Wang Y D, Sun J, He H B, Sun C Y. Deterministic policy gradient with integral compensator for robust quadrotor control. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 50(10): 3713−3725 doi: 10.1109/TSMC.2018.2884725
    [15] Zhu J W, Zhang H, Zhao S B, Bao W M. Multi-constrained intelligent gliding guidance via optimal control and DQN. Science China Information Sciences, 2023, 66(3): 132202 doi: 10.1007/s11432-022-3543-4
    [16] Li X, Wang X G, Zhou H Y, Li Y. A novel evasion guidance for hypersonic morphing vehicle via intelligent maneuver strategy. Chinese Journal of Aeronautics, 2024, 37(5): 441−461 doi: 10.1016/j.cja.2024.02.024
    [17] Cao C Y, Li F B, Xie Q C, Liao Y X, Huang T W, Yang C H. Integrated guidance and control of morphing flight vehicle via sliding mode based robust reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2025, 55(5): 3350−3362 doi: 10.1109/TSMC.2025.3540262
    [18] Cao C Y, Li F B, Ding R, Huang C H, Gui W H. Intelligent attitude control for morphing flight vehicle: a deep reinforcement learning approach. IEEE Transactions on Vehicular Technology, 2025, 74(6): 8851−8865 doi: 10.1109/TVT.2025.3541606
    [19] Salimans T, Ho J, Chen X, Sidor S, Sutskever I. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv: 1703.03864, 2017.
    [20] Such F P, Madhavan V, Conti E, Lehman J, Stanley K O, Clune J. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv preprint arXiv: 1712.06567, 2017.
    [21] Majid A Y, Saaybi S, Francois-Lavet V, Prasad R V, Verhoeven C. Deep reinforcement learning versus evolution strategies: A comparative survey. IEEE transactions on neural networks and learning systems, 2023, 35(9): 11939−11957
    [22] Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv preprint arXiv: 1707.06347, 2017.
    [23] Sigaud O. Combining evolution and deep reinforcement learning for policy search: A survey. ACM Transactions on Evolutionary Learning, 2023, 3(3): 1−20
    [24] Khadka S, Tumer K. Evolution-guided policy gradient in reinforcement learning. arXiv preprint arXiv: 1805.07917, 2018.
    [25] Marchesini E, Corsi D, Farinelli A. Genetic soft updates for policy evolution in deep reinforcement learning. In: Proceedings of International Conference on Learning Representations (ICLR). Virtual Event, Austria: OpenReview.net, 2021.
    [26] Khadka S, Majumdar S, Nassar T, Dwiel Z, Tumer E, Miret S, Liu Y, Tumer K. Collaborative evolutionary reinforcement learning. In: Proceedings of the International Conference on Machine Learning (ICML). Long Beach, California, America: PMLR, 2019. 3341-3350
    [27] Pourchot A, Sigaud O. CEM-RL: Combining evolutionary and gradient-based methods for policy search. arXiv preprint arXiv: 1810.01222, 2018.
    [28] De Boer P T, Kroese D P, Mannor S, Rubinstein R Y. A tutorial on the cross-entropy method. Annals of operations research, 2005, 134: 19−67 doi: 10.1007/s10479-005-5724-z
    [29] 李惠峰, 肖进, 林平. 基于参数化外形的通用大气飞行器建模与分析. 宇航学报, 2011, 32(11): 2305−2311 doi: 10.3873/j.issn.1000-1328.2011.11.001

    Li Hui-Feng, Xiao Jin, Lin Ping. Modeling and analysis of general atmospheric aircraft based on parametric shape. Journal of Astronautics, 2011, 32(11): 2305−2311 doi: 10.3873/j.issn.1000-1328.2011.11.001
    [30] 曹承钰, 李繁飙, 廖宇新, 殷泽阳, 桂卫华. 高超声速变外形飞行器建模与固定时间预设性能控制. 自动化学报, 2024, 50(3): 486−504 doi: 10.16383/j.aas.c230240

    Cao Cheng-Yu, Li Fan-Biao, Liao Yu-Xin, Yin Ze-Yang, Gui Wei-Hua. Modeling and fixed-time prescribed performance control for hypersonic morphing vehicle. Acta Automatica Sinica, 2024, 50(3): 486−504 doi: 10.16383/j.aas.c230240
    [31] 孙长银, 穆朝絮, 余瑶. 近空间高超声速飞行器控制的几个科学问题研究. 自动化学报, 2013, 39(11): 1901−1913 doi: 10.3724/SP.J.1004.2013.01901

    Sun Chang-Yin, Mu Chao-Xu, Yu Yao. Some control problems for near space hypersonic vehicles. Acta Automatica Sinica, 2013, 39(11): 1901−1913 doi: 10.3724/SP.J.1004.2013.01901
    [32] 张迎雪, 管萍, 戈新生. 高超声速飞行器的模糊分数阶PID控制. 航天控制, 2020, 38(6): 31−37 doi: 10.3969/j.issn.1006-3242.2020.06.006

    Zhang Ying-Xue, Guan Ping, Ge Xin-Sheng. Fuzzy fractional PID control of hypersonic vehicles. Aerospace Control, 2020, 38(6): 31−37 doi: 10.3969/j.issn.1006-3242.2020.06.006
    [33] 黄绍洧, 都延丽, 刘燕斌, 王跃萍, 刘武. 有限时间收敛的自适应滑模协同末制导. 系统工程与电子技术, 2025, 47(3): 961—969

    Huang Shao-Wei, Du Yan-Li, Liu Yan-Bin, Wang Yue-Ping, Liu Wu. Adaptive sliding mode cooperative terminal guidance with finite-time convergence. Systems Engineering and Electronics Technology, 25, 47(3): 961-969
    [34] 唐建, 齐瑞云, 姜斌. 考虑约束的高超声速飞行器制导与控制一体化设计. 宇航学报, 2022, 43(5): 649−664

    Tang Jian, Qi Rui-Yun, Jiang-Bin. Integrated design of guidance and control for hypersonic vehicles considering constraints. Journal of Astronautics, 2022, 43(5): 649−664
    [35] Bao C Y, Wang P, Tang G J. Integrated method of guidance, control and morphing for hypersonic morphing vehicle in glide phase. Chinese Journal of Aeronautics, 2021, 34(5): 535−553 doi: 10.1016/j.cja.2020.11.009
  • 加载中
计量
  • 文章访问数:  26
  • HTML全文浏览量:  15
  • 被引次数: 0
出版历程
  • 网络出版日期:  2025-12-27

目录

    /

    返回文章
    返回