2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于深度强化学习的无人机自主感知-规划-控制策略

吕茂隆 丁晨博 韩浩然 段海滨

吕茂隆, 丁晨博, 韩浩然, 段海滨. 基于深度强化学习的无人机自主感知-规划-控制策略. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240639
引用本文: 吕茂隆, 丁晨博, 韩浩然, 段海滨. 基于深度强化学习的无人机自主感知-规划-控制策略. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240639
Lv Mao-Long, Ding Chen-Bo, Han Hao-Ran, Duan Hai-Bin. Autonomous perception-planning-control strategy based on deep reinforcement learning for unmanned aerial vehicles. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240639
Citation: Lv Mao-Long, Ding Chen-Bo, Han Hao-Ran, Duan Hai-Bin. Autonomous perception-planning-control strategy based on deep reinforcement learning for unmanned aerial vehicles. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240639

基于深度强化学习的无人机自主感知-规划-控制策略

doi: 10.16383/j.aas.c240639 cstr: 32138.14.j.aas.c240639
基金项目: 国家自然科学基金(62303489, GKJJ24050502, 62350048, T2121003), 博士后面上基金(2022M723877), 博士后特别资助(2023T160790), 中国博士后国际交流引进计划(YJ20220347), 陕西省青年人才托举工程(20220101), 陕西省自然科学基础研究计划(2024JC-YBQN-0668)资助
详细信息
    作者简介:

    吕茂隆:空军工程大学副教授, 荷兰代尔夫特理工大学博士. 主要研究方向为集群无人机协同打击、有人-无人智能空战. 本文通信作者. E-mail: maolonglv@163.com

    丁晨博:空军工程大学研究生院博士研究生. 主要研究方向为智能无人作战, 有人-无人协同作战和集群无人机协同打击. E-mail: chenbo_ding2024@163.com

    韩浩然:电子科技大学信息与通信工程学院博士研究生. 主要研究方向为强化学习技术与应用. E-mail: hanadam@163.com

    段海滨:北京航空航天大学教授. 主要研究方向为基于仿生智能的无人机自主控制研究. E-mail: hbduan@buaa.edu.cn

Autonomous Perception-Planning-Control Strategy Based on Deep Reinforcement Learning for Unmanned Aerial Vehicles

Funds: Supported by National Natural Science Foundation of China(62303489, GKJJ24050502, 62350048, T2121003), Post-Doctoral Foundation(2022M723877), Post-Doctoral Special Grant(2023T160790), China Post-Doctoral International Exchange Introduction Program(YJ20220347), Shaanxi Provincial Youth Talent Promotion Project(20220101), and Shaanxi Natural Science Basic Research Program (2024JC-YBQN-0668)
More Information
    Author Bio:

    LV Mao-Long Associate professor at Air Force Engineering University, Doctorate from Delft University of Technology in the Netherlands. His research interest covers cooperative strike of swarming unmanned aerial vehicles and intelligent air combat between manned and unmanned systems. Corresponding author of this paper

    DING Chen-Bo Ph.D. candidate at the Graduate School of Air Force Engineering University. His research interest covers intelligent unmanned combat, man-unmanned cooperative combat and coordinated strike by clustered unmanned aerial vehicles

    HAN Hao-Ran Ph.D. candidate at the School of Information and Communication Engineering, University of Electronic Science and Technology. His main research interest is reinforcement learning techniques and applications

    DUAN Hai-Bin Professor at the Beijing University of Aeronautics and Astronautics. His main research interest is the autonomous control of UAV based on bionic intelligence

  • 摘要: 近年来, 随着深度强化学习方法快速发展, 其在无人机自主导航上的应用也受到越来越广泛地关注. 然而, 面对复杂未知的环境, 现存的基于深度强化学习的无人机自主导航算法常受限于对全局信息的依赖和特定训练环境的约束, 极大地限制了其在各种场景当中的应用潜力. 为了解决上述问题, 提出了多尺度输入用于平衡感受野与状态维度, 以及截断操作来使智能体能够在扩张后的环境中运行. 此外, 构建了自主感知-规划-控制架构, 赋予无人机在多样复杂环境中自主导航的能力.
  • 图  1  算法训练障碍物环境观察图

    Fig.  1  Algorithm training obstacle environment observation map

    图  2  感知状态覆盖空间表示

    Fig.  2  Perceptual state coverage space representation

    图  3  不同截断操作示意图

    Fig.  3  Illustrations of different truncation operations

    图  4  自主导航技术框架

    Fig.  4  Autonomous navigation technology framework

    图  5  不同尺度输入下导航性能图

    Fig.  5  Navigation performance maps at different scales of input

    图  6  任务一导航情况对比图

    Fig.  6  Task 1 navigation situation comparison chart

    图  7  不同截断参数下导航性能图

    Fig.  7  Navigation performance charts for different truncation parameters

    图  8  截断效果展示图

    Fig.  8  The diagram showing the cutting effect

    图  9  基于深度强化学习的自主导航技术导航效果展示图

    Fig.  9  A demonstration of the navigation performance based on deep reinforcement learning-based autonomous navigation technology

    图  10  基于A*算法的自主导航技术导航效果展示图

    Fig.  10  A* algorithm-based autonomous navigation technology demonstration map for navigation effect

    图  11  决策短视问题情况图

    Fig.  11  Graph of short-sighted decision-making problems

    图  12  轨迹震荡问题情况图

    Fig.  12  Graph of trajectory oscillation problem

    表  1  Double DQN算法伪代码

    Table  1  Pseudocode for Double DQN algorithm

    算法1. 用于无人机路径规划的Double DQN
    输入. 环境$\tilde X$, 初始化网络参数α
    1: 对于每一轮训练episode = 1, 2,... 执行以下代码:
    2:   初始化无人机的出发位置${\tilde p_\text{init}} \sim U({\tilde M_\text{free}})$
    3:   初始化无人机的目标位置${\tilde p_\text{d}} \sim U({\tilde M_\text{free}}/{\tilde p_\text{init}})$
    4:   对于每一刻时间 = 0, 1,... 执行以下代码:
    5:    在概率$\varepsilon $下, 均匀随机选择动作at = U(A)
    6:    否则, 选择动作at = arg max Q(st, a)
    7:    执行动作at, 同时获取下一时刻状态和奖励st + 1, rt
    8:    将本次轨迹(st, at, st + 1, rt)放入经验回放缓冲区D
    9:    随机均匀采集N条轨迹(st, at, st + 1, rt), 满足1 ≤ iN
    10:    对于每一条轨迹, 使用式(7)的损失函数更新主网络参数α
    11:    每C次训练步数使用主网络参数更新目标网络参数$\hat \alpha \leftarrow \alpha $
    下载: 导出CSV

    表  2  无人机自主导航技术伪代码

    Table  2  Pseudocode for Drone Autonomous Navigation Technology

    算法2. 无人机自主导航技术
    输入. 环境$X$, 网络参数$\alpha $, 无人机初始位置${p_\text{init}}$, 期望位置${p_\text{d}}$
    1: 初始化感知、规划和飞行控制时间${t^\text{sense}}$, ${t^\text{plan}}$和${t^\text{control}}$为零
    2: 初始化体素环境空间$\tilde X$为无障碍物空间
    3: 判定无人机未到达目标位置时, 即$\parallel p - {p_d}{\parallel _2}$, 执行以下代码:
    4:   执行路径规划及轨迹优化模块:
    5:    初始化离散路径点$\tilde P = \left[ {{{\tilde p}_0}} \right]$, 初始位置${\tilde p_0} = \rm{floor}(p/2\tilde d)$
    6:    对于每一刻时间$t = 0,\; 1,\; \ldots ,\; x - 1$, 执行以下代码:
    7:       利用网络参数$\alpha $选择动作${a_t} = \arg \max Q({s_t},\; a)$
    8:       在体素环境中执行动作${a_t}$
    9:       更新离散路径点$\tilde P.\text{append}({\tilde p_{t + 1}})$
    10:    使用式(13)进行飞行轨迹优化$\Gamma $
    11:    更新飞行控制时间为零, 即${t^\text{control}} = 0$
    12:  更新规划时间${t^\text{plan}} + = \Delta t$
    13:  当判定满足环境感知条件$\text{mod} ({T^\text{sense}},\; {t^\text{sense}}) = = 0$时:
    14:    获取新的障碍物环境${\tilde M'_\text{obs}}$
    15:    当判定原感知障碍物环境与规划路径冲突${\tilde M_{{\rm{obs}}}} \cap \tilde P \ne \emptyset $时:
    16:      设定规划时间为零, 即${t^\text{plan}} = 0$
    17:    更新障碍物环境${\tilde M_\text{obs}} = {\tilde M_{obs}} \cup {\tilde M'_{obs}}$
    18:  更新感知时间${t^\text{sense}} + = \Delta t$
    19:  当判定满足轨迹优化条件${t^\text{control}} = = 0$时:
    20:    重新进行飞行轨迹优化$\Gamma $
    21:  产生PWM信号
    22:  更新飞行控制时间${t^\text{control}} + = \Delta t$
    下载: 导出CSV

    表  3  迭代训练后不同尺度输入下智能体导航性能

    Table  3  Navigation performance of agents under different scale inputs after iterative training

    成功率(%) 碰撞率(%) 失效率(%) 路径长度(m)
    尺度I状态 89.61% 0.02% 10.37% 8.73
    尺度II状态 88.25% 0.72% 11.03% 8.79
    尺度III状态 0.51% 17.66% 81.83% 11.56
    多尺度状态 97.19% 0.08% 2.74% 8.62
    下载: 导出CSV

    表  4  不同体素环境下的导航性能

    Table  4  Navigation performance in different voxel environments

    成功率(%) 碰撞率(%) 失效率(%) 路径长度(m)
    训练环境 97.19% 0.08% 2.74% 8.62
    测试环境(未采取截断操作) 77.20% 6.52% 16.29% 20.30
    测试环境(采取截断操作) 96.93% 0.17% 2.89% 21.90
    下载: 导出CSV

    表  5  A*算法与DRL路径规划器导航情况对比

    Table  5  Comparison of A* algorithm with DRL path planner navigation

    任务一任务二任务三任务四
    任务直线距离9.65米10.84米13.85米35.07米
    A*算法11.68米11.97米导航碰撞导航碰撞
    DRL路径规划器11.59米13.39米18.80米40.26米
    下载: 导出CSV
  • [1] 陈锦涛, 李鸿一, 任鸿儒, 鲁仁全. 基于RRT森林算法的高层消防多无人机室内协同路径规划. 自动化学报, 2023, 49(12): 2615−2626

    Chen Jin-Tao, Li Hong-Yi, Ren Hong-Ru, Lu Ren-Quan. Indoor collaborative path planning of high-rise fire multi-UAV based on RRT forest algorithm. Acta Automatica Sinica, 2023, 49(12): 2615−2626
    [2] 郭复胜, 高伟. 基于辅助信息的无人机图像批处理三维重建方法. 自动化学报, 2013, 39(6): 834−845 doi: 10.1016/S1874-1029(13)60056-7

    Guo Fu-Sheng, Gao Wei. A 3D reconstruction method for UAV image batch processing based on auxiliary information. Acta Automatica Sinica, 2013, 39(6): 834−845 doi: 10.1016/S1874-1029(13)60056-7
    [3] 张广驰, 何梓楠, 崔苗. 基于深度强化学习的无人机辅助移动边缘计算系统能耗优化. 电子与信息学报, 2023, 45(5): 1635−1643

    Zhang Guang-Chi, HE Zi-Nan, Cui Miao. Energy consumption optimization of UAV-assisted mobile edge computing system based on deep reinforcement learning. Journal of Electronics and Information Technology, 2023, 45(5): 1635−1643
    [4] 林韩熙, 向丹, 欧阳剑, 兰晓东. 移动机器人路径规划算法的研究综述. Journal of Computer Engineering & Applications, 2021, 57(18): 38

    Lin Han-Xi, Xiang Dan, OuYang Jian, Lan Xiao-Dong. Research review of path planning algorithms for mobile robots. Journal of Computer Engineering & Applications, 2021, 57(18): 38
    [5] Tong G, Jiang N, Li B Y, Zhu X, Wang Y, Du W B. UAV navigation in high dynamic environments: A deep reinforcement learning approach. Chinese Journal of Aeronautics, 2021, 34(2): 479−489 doi: 10.1016/j.cja.2020.05.011
    [6] 孙辉辉, 胡春鹤, 张军国. 移动机器人运动规划中的深度强化学习方法. 控制与决策, 2021, 36(6): 1281−1292

    Sun Hui-Hui, Hu Chun-He, Zhang Jun-Guo. Deep reinforcement learning in mobile robot motion planning. Control and Decision, 2021, 36(6): 1281−1292
    [7] 高阳, 陈世福, 陆鑫. 强化学习研究综述. 自动化学报, 2004, 30(1): 86−100

    Gao Yang, Chen Shi-Fu, Lu Xin. Review of reinforcement learning research. Acta Automatica Sinica, 2004, 30(1): 86−100
    [8] Yang Y, Li J T, Peng L L. Multi-robot path planning based on a deep reinforcement learning dqn algorithm. CAAI Transactions on Intelligence Technology, 2020, 5(3): 177−183 doi: 10.1049/trit.2020.0024
    [9] 赵静, 裴子楠, 姜斌, 陆宁云, 赵斐, 陈树峰. 基于深度强化学习的无人机虚拟管道视觉避障. 自动化学报, 2024, 50(11): 1−14

    Zhao Jing, Pei Zi-Nan, Jiang Bin, Lu Ning-Yun, Zhao Fei, Chen Shu-Feng. Virtual tube visual obstacle avoidance for UAV based on deep reinforcement learning. Acta Automatica Sinica, 2024, 50(11): 1−14
    [10] 赵栓峰, 黄涛, 许倩, 耿龙龙. 面向无人机自主飞行的无监督单目视觉深度估计. Laser & Optoelectronics Progress, 2020, 57(2): 21012−1

    Zhao Shuan-Feng, Huang Tao, Xu Qian, Geng Long-Long. Unsupervised monocular visual depth estimation for autonomous flight of unmanned aerial vehicles. Laser & Optoelectronics Progress, 2020, 57(2): 21012−1
    [11] 刘佳铭. 基于深度卷积神经网络的无人机识别方法研究. 舰船电子工程, 2019, 39(2): 22−26 doi: 10.3969/j.issn.1672-9730.2019.02.007

    Liu Jia-Ming. Uav recognition method based on deep Convolutional neural network. Naval Electronic Engineering, 2019, 39(2): 22−26 doi: 10.3969/j.issn.1672-9730.2019.02.007
    [12] Bouhamed O, Ghazzai H, Besbes H, Massoud Y. Autonomous UAV navigation: A DDPG-based deep reinforcement learning approach In: Proceedings of 2020 IEEE International Symposium on Circuits and Systems, Seville, Spain: IEEE, 2020.1−5
    [13] Luo X Q, Wang Q Y, Gong H F, Tang C. UAV path planning based on the average TD3 algorithm with prioritized experience replay.IEEE Access, DOI: 10.1109/ACCESS.2024.3375083
    [14] Lei H, Nabil A, Song B. Explainable deep reinforcement learning for UAV autonomous path planning. Aerospace science and technology, DOI: 10.1016/j.ast.2021.107052.
    [15] Xi Z L, Han H R, Zhang Y R, Cheng J. Autonomous navigation of QUAVs under 3D environments based on hierarchical reinforcement learning. In: Proceedings of 2023 42nd Chinese Control Conference. IEEE, Tianjin, China, 2023. 4101−4106
    [16] Ugurlu H I, Pham X H, Kayacan E. Sim-to-real deep reinforcement learning for safe end-to-end planning of aerial robots. Robotics, 2022, 11(5): 109 doi: 10.3390/robotics11050109
    [17] 郭子恒, 蔡晨晓. 基于改进深度强化学习的无人机自主导航方法. Information & Control, 2023, 52(6): 736−746

    Guo Zi-Heng, Cai Chen-Xiao. Uav autonomous navigation method based on improved deep reinforcement learning. Information Control, 2023, 52(6): 736−746
    [18] 满恂钰, 刘元盛, 齐含, 严超, 杨茹锦. 未知环境下无人车自主导航探索与地图构建. Automobile Technology, 2023(11): 34−40

    Man Xun-Yu, Liu Yuan-Sheng, Qi Han, Yan Chao, Yang Ru-Jin. Autonomous navigation exploration and map construction of unmanned vehicles in unknown environment. Automobile Technology, 2023(11): 34−40
    [19] 董校成, 于浩淼, 郭晨. 能耗与时间约束下的uuv三维路径规划. 大连海事大学学报, 2022, 48(2): 11−20

    Dong Xiao-Cheng, YU Hao-Miao, Guo Chen. Uuv 3D path planning under energy consumption and time constraints. Journal of Dalian Maritime University, 2022, 48(2): 11−20
    [20] Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. Phoenix, Arizona, USA: AAAI Press, 2016. 2084-2090
    [21] 朱少凯, 孟庆浩, 金晟, 戴旭阳. 基于深度强化学习的室内视觉局部路径规划. 智能系统学报, 2022, 17(5): 908−918 doi: 10.11992/tis.202107059

    Zhu Shao-kai, Meng Qing-Hao, Jin Sheng, Dai Xu-yang. Indoor visual local path planning based on deep reinforcement learning. Journal of Intelligent Systems, 2022, 17(5): 908−918 doi: 10.11992/tis.202107059
    [22] Frantisek D, Andrej B, Martin K, Peter B, Martin F, Tomas F, et al. Path planning with modified a star algorithm for a mobile robot. Procedia engineering, 2014, 96: 59−69 doi: 10.1016/j.proeng.2014.12.098
    [23] Shiri F M, Perumal T, Mustapha N, Raihani M. A comprehensive overview and comparative analysis on deep learning models: CNN, RNN, LSTM, GRU. arXiv preprint arXiv: 2305.17473, 2023
    [24] Hutsebaut-Buysse M, Mets K, Latré S. Hierarchical reinforcement learning: A survey and open research challenges. Machine Learning and Knowledge Extraction, 2022, 4(1): 172−221 doi: 10.3390/make4010009
    [25] 王雪松, 王荣荣, 程玉虎. 安全强化学习综述. 自动化学报, 2023, 49(9): 1813−1835

    Wang Xue-Song, Wang Rong-Rong, Cheng Yu-Hu. Review on security reinforcement learning. Acta Automatica Sinica, 2023, 49(9): 1813−1835
  • 加载中
计量
  • 文章访问数:  90
  • HTML全文浏览量:  38
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-09-20
  • 录用日期:  2024-11-21
  • 网络出版日期:  2025-02-10

目录

    /

    返回文章
    返回