2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

无人机反应式扰动流体路径规划

吴健发 王宏伦 王延祥 刘一恒

吴健发, 王宏伦, 王延祥, 刘一恒. 无人机反应式扰动流体路径规划. 自动化学报, 2023, 49(2): 272−287 doi: 10.16383/j.aas.c210231
引用本文: 吴健发, 王宏伦, 王延祥, 刘一恒. 无人机反应式扰动流体路径规划. 自动化学报, 2023, 49(2): 272−287 doi: 10.16383/j.aas.c210231
Wu Jian-Fa, Wang Hong-Lun, Wang Yan-Xiang, Liu Yi-Heng. UAV reactive interfered fluid path planning. Acta Automatica Sinica, 2023, 49(2): 272−287 doi: 10.16383/j.aas.c210231
Citation: Wu Jian-Fa, Wang Hong-Lun, Wang Yan-Xiang, Liu Yi-Heng. UAV reactive interfered fluid path planning. Acta Automatica Sinica, 2023, 49(2): 272−287 doi: 10.16383/j.aas.c210231

无人机反应式扰动流体路径规划

doi: 10.16383/j.aas.c210231
基金项目: 国家自然科学基金(62173022, 61673042, 61175084)资助
详细信息
    作者简介:

    吴健发:北京控制工程研究所博士后. 主要研究方向为飞行器智能决策与协同控制. E-mail: jianfa_wu@163.com

    王宏伦:北京航空航天大学自动化科学与电气工程学院教授. 主要研究方向为飞行器自主与智能控制, 抗扰动控制, 无人系统路径规划与精确跟踪. 本文通信作者. E-mail: hl_wang_2002@126.com

    王延祥:北京航空航天大学自动化科学与电气工程学院博士研究生. 主要研究方向为无人机路径规划, 空中加油精准引导与控制. E-mail: wyxjy51968@163.com

    刘一恒:北京航空航天大学自动化科学与电气工程学院博士研究生. 主要研究方向为飞行控制, 轨迹规划和机器学习. E-mail: 18810010709@163.com

UAV Reactive Interfered Fluid Path Planning

Funds: Supported by National Natural Science Foundation of China (62173022, 61673042, 61175084)
More Information
    Author Bio:

    WU Jian-Fa Postdoctor at Bei-jing Institute of Control Engineering. His research interest covers intelligent decision-making and coordinated control of flight vehicles

    WANG Hong-Lun Professor at the School of Automation Science and Electrical Engineering, Beihang University. His research interest covers autonomous and intelligent control of flight vehicles, anti-disturbance control, and path planning and precise tracking control of unmanned systems. Corresponding author of this paper

    WANG Yan-Xiang Ph.D. candidate at the School of Automation Science and Electrical Engineering, Beihang University. His research interest covers UAV path planning and precision guidance, and control of air refueling

    LIU Yi-Heng Ph.D. candidate at the School of Automation Science and Electrical Engineering, Beihang University. His research interest covers flight control, trajectory planning, and machine learning

  • 摘要: 针对复杂三维障碍环境, 提出一种基于深度强化学习的无人机(Unmanned aerial vehicles, UAV) 反应式扰动流体路径规划架构. 该架构以一种受约束扰动流体动态系统算法作为路径规划的基本方法, 根据无人机与各障碍的相对状态以及障碍物类型, 通过经深度确定性策略梯度算法训练得到的动作网络在线生成对应障碍的反应系数和方向系数, 继而可计算相应的总和扰动矩阵并以此修正无人机的飞行路径, 实现反应式避障. 此外, 还研究了与所提路径规划方法相适配的深度强化学习训练环境规范性建模方法. 仿真结果表明, 在路径质量大致相同的情况下, 该方法在实时性方面明显优于基于预测控制的在线路径规划方法.
  • 图  1  不同反应系数和方向系数组合对规划路径的影响

    Fig.  1  Effects of different combinations of reactioncoefficients and direction coefficients onplanned paths

    图  2  所提反应式路径规划的DDPG训练机制

    Fig.  2  DDPG training mechanism of the proposed reaction path planning

    图  3  评价网络和动作网络结构

    Fig.  3  Structures of critic network and actor network

    图  4  基于深度强化学习的反应式扰动流体路径规划总体流程图

    Fig.  4  Overview flow chart of the DRL-based reaction interfered fluid path planning

    图  5  无人机相对初始位置设定的差异: 以静态半球体障碍和动态球体威胁为例

    Fig.  5  Differences in the setting of UAV initial locations: Taking the static hemispherical obstacle and thedynamic spherical threat as examples

    图  6  针对静态半球体障碍的无人机反应式路径规划训练环境

    Fig.  6  Training environment of UAV reaction path planning for static hemispherical obstacles

    图  7  初始$ {\theta _h} $的概率分布

    Fig.  7  Probability distribution of the initial $ {\theta _h} $

    图  8  采用IFDS和C-IFDS时部分受约束的状态和规划路径的对比情况

    Fig.  8  Comparisons of some constrained states and planned paths when using IFDS and C-IFDS

    图  9  DDPG训练过程中的奖励函数情况

    Fig.  9  Reward functions in the DDPG training process

    图  10  案例2中在线规划的三维路径

    Fig.  10  3D online planned paths in case 2

    图  11  案例2中与动态威胁等效表面的最近距离

    Fig.  11  Closest distances to the equivalent surface of the dynamic threat in case 2

    图  12  案例2中规划时间对比

    Fig.  12  Comparison of the planning time in case 2

    图  13  未采用所提环境建模方法时, DDPG训练过程中的奖励函数情况: 以静态半球体障碍为例

    Fig.  13  Reward functions in the DDPG training process when the proposed environment modeling methodis not adopted: Taking the static hemisphericalobstacle as an example

    图  14  案例3中不同时刻无人机的航迹与规划路径

    Fig.  14  UAV flight paths and planned paths at different times in case 3

    图  15  案例3中与各动态威胁等效表面的最近距离

    Fig.  15  Closest distances to the equivalent surface of each dynamic threat in case 3

    表  1  案例2中规划路径长度和平滑性指标对比

    Table  1  Comparison of the length and the smooth indexes for planned paths in Case 2

    指标本文方法对比项 1对比项 2对比项 3
    长度 (km)7.568.497.687.65
    平滑性0.13180.35060.15930.1528
    下载: 导出CSV
  • [1] 吕洋, 康童娜, 潘泉, 赵春晖, 胡劲文. 无人机感知与规避: 概念, 技术与系统. 中国科学: 信息科学, 2019, 49(5): 520-537 doi: 10.1360/N112018-00318

    Lyu Yang, Kang Tong-Na, Pan Quan, Zhao Chun-Hui, Hu Jin-Wen. UAV sense and avoidance: Concepts, technologies, and systems. Scientia Sinica Informationis, 2019, 49(5): 520-537 doi: 10.1360/N112018-00318
    [2] 刘磊, 高岩, 吴越鹏. 基于生存理论的非完整约束轮式机器人高速避障控制. 控制与决策, 2014, 29(9): 1623-1628

    Liu Lei, Gao Yan, Wu Yue-Peng. High speed obstacle avoidance control of wheeled mobile robots with non-homonymic constraint based on viability theory. Control and Decision, 2014, 29(9): 1623-1628
    [3] Steiner J A, He X, Bourne J R, Leang K K. Open-sector rapid-reactive collision avoidance: Application in aerial robot navigation through outdoor unstructured environments. Robotics and Autonomous Systems, 2019, 112: 211-220 doi: 10.1016/j.robot.2018.11.016
    [4] Lindqvist B, Mansouri S S, Agha-mohammadi A, Nikolakopoulos G. Nonlinear MPC for collision avoidance and control of UAVs with dynamic obstacles. IEEE Robotics and Automation Letters, 2020, 5(4): 6001-6008 doi: 10.1109/LRA.2020.3010730
    [5] 茹常剑, 魏瑞轩, 戴静, 沈东, 张立鹏. 基于纳什议价的无人机编队自主重构控制方法. 自动化学报, 2013, 39(8): 1349-1359

    Ru Chang-Jian, Wei Rui-Xuan, Dai Jing, Shen Dong, Zhang Li-Peng. Autonomous reconflguration control method for UAV’s formation based on Nash bargain. Acta Automatica Sinica, 2013, 39(8): 1349-1359
    [6] Luo G, Yu J, Mei Y, Zhang S. UAV path planning in mixed-obstacle environment via artificial potential field method improved by additional control force. Asian Journal of Control, 2015, 17(5): 1600-1610 doi: 10.1002/asjc.960
    [7] Wu J, Wang H, Zhang M, Yu Y. On obstacle avoidance path planning in unknown 3D environments: A fluid-based framework. ISA Transactions, 2021, 111: 249-264 doi: 10.1016/j.isatra.2020.11.017
    [8] 魏瑞轩, 何仁珂, 张启瑞, 许卓凡, 赵晓林. 基于Skinner理论的无人机应急威胁规避方法. 北京理工大学学报, 2016, 36(6): 620-624

    Wei Rui-Xuan, He Ren-Ke, Zhang Qi-Rui, Xu Zhuo-Fan, Zhao Xiao-Lin. Skinner-based emergency collision avoidance mechanism for UAV. Transactions of Beijing Institute of Technology, 2016, 36(6): 620-624
    [9] Hebecker T, Buchholz R, Ortmeier F. Model-based local path planning for UAVs. Journal of Intelligent and Robotic Systems, 2015, 78(1): 127-142 doi: 10.1007/s10846-014-0097-7
    [10] 李凯文, 张涛, 王锐, 覃伟健, 贺惠晖, 黄鸿. 基于深度强化学习的组合优化研究进展. 自动化学报, 2021, 47(11): 1001-1028

    Li Kai-Wen, Zhang Tao, Wang Rui, Qin Wei-Jian, Huang Hong. Research reviews of combinatorial optimization methods based on deep reinforcement learning. Acta Automatica Sinica, 2021, 47(11): 1001-1028
    [11] 王鼎. 基于学习的鲁棒自适应评判控制研究进展. 自动化学报, 2019, 45(6): 1031-1043

    Wang Ding. Research progress on learning-based robust adaptive critic control. Acta Automatica Sinica, 2019, 45(6): 1031-1043
    [12] Wang D, Ha M, Qiao J. Self-learning optimal regulation for discrete-time nonlinear systems under event-driven formulation. IEEE Transactions on Automatic Control, 2019, 65(3): 1272-1279
    [13] Guo T, Jiang N, Li B, Zhu X, Wang Y, Du W. UAV navigation in high dynamic environments: A deep reinforcement learning approach. Chinese Journal of Aeronautics, 2021, 34(2): 479-489 doi: 10.1016/j.cja.2020.05.011
    [14] Tai L, Paolo G, Liu M. Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, USA: IEEE, 2017. 31−36
    [15] Wang C, Wang J, Shen Y, Zhang X. Autonomous navigation of UAVs in large-scale complex environments: A deep reinforcement learning approach. IEEE Transactions on Vehicular Technology, 2019, 68(3): 2124-2136 doi: 10.1109/TVT.2018.2890773
    [16] Wang C, Wang J, Wang J, Zhang X. Deep-reinforcement-learning-based autonomous UAV navigation with sparse rewards. IEEE Internet of Things Journal, 2020, 7(7): 6180-6190 doi: 10.1109/JIOT.2020.2973193
    [17] Hu Z, Gao X, Wang K, Zhai Y, Wang Q. Relevant experience learning: A deep reinforcement learning method for UAV auto-nomous motion planning in complex unknown environments. Chinese Journal of Aeronautics, 2021, 34(12): 187-204
    [18] Yao P, Wang H, Su Z. Real-time path planning of unmanned aerial vehicle for target tracking and obstacle avoidance in complex dynamic environment. Aerospace Science and Technology, 2015, 47: 269-279 doi: 10.1016/j.ast.2015.09.037
    [19] Yao P, Wang H, Su Z. Cooperative path planning with applications to target tracking and obstacle avoidance for multi-UAVs. Aerospace Science and Technology, 2016, 54: 10-22 doi: 10.1016/j.ast.2016.04.002
    [20] Wu J, Wang H, Zhang M, Su Z. Cooperative dynamic fuzzy perimeter surveillance: Modeling and fluid-based framework. IEEE Systems Journal, 2020, 14(4): 5210-5220 doi: 10.1109/JSYST.2020.2974869
    [21] Lillicrap T P, Hunt J J, Pritzel A, Heess N, Erez T, Tassa Y, et al. Continuous control with deep reinforcement learning. arXiv preprint arXiv: 1509.02971, 2015.
    [22] Schaul T, Quan J, Antonoglou I, Silver D. Prioritized experience replay. arXiv preprint arXiv: 1511.05952, 2016.
    [23] Hou Y, Liu L, Wei Q, Xu X, Chen C. A novel DDPG method with prioritized experience replay. In: Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. Piscataway, USA: IEEE, 2017. 316−321
    [24] Kingma D P, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv: 1412.6980, 2017.
    [25] Wang Y, He H, Sun C. Learning to navigate through complex dynamic environment with modular deep reinforcement learning. IEEE Transactions on Games, 2018, 10(4): 400-412 doi: 10.1109/TG.2018.2849942
    [26] Sutton R S, Modayil J, Delp M, Degris T, Pilarski P M, White A. Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In: Proceedings of the 10th International Conference on Autonomous Agen-ts and Multiagent Systems. Richland, USA: 2011. 761−768
    [27] Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th International Conference on Machine Learning. New York, USA: 2018. 1861−1870
    [28] Fujimoto S, Hoof H, Meger D. Addressing function approximation error in actor-critic methods. In: Proceedings of the 35th International Conference on Machine Learning. New York, USA: 2018. 1587−1596
    [29] Wu J, Wang H, Li N, Su Z. Formation obstacle avoidance: A fluid-based solution. IEEE Systems Journal, 2019, 14(1): 1479-1490
  • 加载中
图(15) / 表(1)
计量
  • 文章访问数:  1822
  • HTML全文浏览量:  793
  • PDF下载量:  428
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-03-29
  • 录用日期:  2021-09-17
  • 网络出版日期:  2021-10-15
  • 刊出日期:  2023-02-20

目录

    /

    返回文章
    返回