基于深度强化学习的无人机自主感知−规划−控制策略

吕茂隆; 丁晨博; 韩浩然; 段海滨

doi:10.16383/j.aas.c240639

基于深度强化学习的无人机自主感知−规划−控制策略

doi: 10.16383/j.aas.c240639 cstr: 32138.14.j.aas.c240639

吕茂隆^{1, 2,},
丁晨博^3,,
韩浩然^4,,
段海滨^5,

1.
空军工程大学空管领航学院西安 710051
2.
空军工程大学无人飞行器技术全国重点实验室西安 710051
3.
空军工程大学研究生院西安 710051
4.
电子科技大学信息与通信工程学院成都 611731
5.
北京航空航天大学自动化科学与电气工程学院飞行器一体化控制全国重点实验室北京 100083

基金项目: 国家自然科学基金(62303489, GKJJ24050502, 62350048, T2121003), 博士后面上基金(2022M723877), 博士后特别资助(2023T160790), 中国博士后国际交流引进计划(YJ20220347), 陕西省青年人才托举工程(20220101), 陕西省自然科学基础研究计划(2024JC-YBQN-0668, 2025JC-QYCX-052)资助

详细信息

作者简介:
吕茂隆：空军工程大学管领航学院副教授. 主要研究方向为集群无人机协同打击、有人−无人智能空战. 本文通信作者. E-mail: maolonglv@163.com

丁晨博：空军工程大学研究生院博士研究生. 主要研究方向为智能无人作战, 有人−无人协同作战和集群无人机协同打击. E-mail: chenbo_ding2024@163.com

韩浩然：电子科技大学信息与通信工程学院博士研究生. 主要研究方向为强化学习技术与应用. E-mail: hanadam@163.com

段海滨：北京航空航天大学自动化科学与电气工程学院教授. 主要研究方向为无人机集群仿生自主飞行控制. E-mail: hbduan@buaa.edu.cn

计量
- 文章访问数: 6067
- HTML全文浏览量: 999
- PDF下载量: 501
- 被引次数: 0
出版历程
- 收稿日期: 2024-09-20
- 录用日期: 2024-11-21
- 网络出版日期: 2025-02-10
- 刊出日期: 2025-06-24

Autonomous Perception-Planning-Control Strategy Based on Deep Reinforcement Learning for Unmanned Aerial Vehicles

1.
Air Traffic Control and Navigation College, Air Force Engineering University, Xi＇an 710051
2.
National Key Laboratory of Unmanned Aerial Vehicle Technology, Air Force Engineering University, Xi＇an 710051
3.
Graduate School, Air Force Engineering University, Xi＇an 710051
4.
School of Information and Communication Engineering, University of Electronic Science and Technology, Chengdu 611731
5.
National Key Laboratory of Aircraft Integrated Flight Control, School of Automation Science and Electrical Engineering, Beihang University, Beijing 100083

Funds: Supported by National Natural Science Foundation of China (62303489, GKJJ24050502, 62350048, T2121003), Post-Doctoral Foundation (2022M723877), Post-Doctoral Special Grant (2023T160790), China Post-Doctoral International Exchange Introduction Program (YJ20220347), Shaanxi Provincial Youth Talent Promotion Project (20220101), and Shaanxi Natural Science Basic Research Program (2024JC-YBQN-0668, 2025JC-QYCX-052)

More Information

Author Bio:
LV Mao-Long　Associate professor at Air Traffic Control and Navigation College, Air Force Engineering University. His research interest covers cooperative strike of swarming unmanned aerial vehicles and intelligent air combat between manned and unmanned systems. Corresponding author of this paper

DING Chen-Bo　Ph.D. candidate at the Graduate School, Air Force Engineering University. His research interest covers intelligent unmanned combat, man-unmanned cooperative combat and cooperative strike of swarming unmanned aerial vehicles

HAN Hao-Ran　Ph.D. candidate at the School of Information and Communication Engineering, University of Electronic Science and Technology. His main research interest is reinforcement learning techniques and applications

DUAN Hai-Bin　Professor at the School of Automation Science and Electrical Engineering, Beihang University. His main research interest is biologically autonomous flight control of unmanned aerial vehicle swarm

摘要

摘要: 近年来, 随着深度强化学习(DRL)方法快速发展, 其在无人机(UAV)自主导航上的应用也受到越来越广泛的关注. 然而, 面对复杂未知的环境, 现存的基于DRL的UAV自主导航算法常受限于对全局信息的依赖和特定训练环境的约束, 极大地限制了其在各种场景中的应用潜力. 为解决上述问题, 提出多尺度输入用于平衡感受野与状态维度, 以及截断操作来使智能体能够在扩张后的环境中运行. 此外, 构建自主感知−规划−控制架构, 赋予UAV在多样复杂环境中自主导航的能力.
- 无人机 /
- 深度强化学习 /
- 自主导航 /
- 复杂未知环境
Abstract: In recent years, with the rapid development of deep reinforcement learning (DRL) methods, their application in the field of unmanned aerial vehicle (UAV) autonomous navigation has attracted increasing attention. However, when facing complex and unknown environments, existing DRL-based UAV autonomous navigation algorithms are often limited by their dependence on global information and the constraints of specific training environments, greatly limiting their potential for application in various scenarios. To address these issues, multi-scale input is proposed to balance the receptive field and the state dimension, and truncation operation is proposed to enable the agent to operate in the expanded environment. In addition, the autonomous perception-planning-control architecture is constructed to give the UAV the ability to navigate autonomously in diverse and complex environments.
- Unmanned aerial vehicle /
- deep reinforcement learning /
- autonomous navigation /
- complex unknown environment

HTML全文

图 1 算法训练障碍物环境观察图

Fig. 1 Algorithm training obstacle environment observation map

下载: 全尺寸图片幻灯片

图 2 感知状态覆盖空间表示

Fig. 2 Perceptual state coverage space representation

下载: 全尺寸图片幻灯片

图 3 不同截断操作示意图

Fig. 3 Illustrations of different truncation operations

下载: 全尺寸图片幻灯片

图 4 自主导航技术框架

Fig. 4 Autonomous navigation technology framework

下载: 全尺寸图片幻灯片

图 5 不同尺度输入下导航性能图

Fig. 5 Navigation performance maps at different scales of input

下载: 全尺寸图片幻灯片

图 6 导航情况对比图

Fig. 6 Navigation situation comparison chart

下载: 全尺寸图片幻灯片

图 7 不同截断参数下导航性能图

Fig. 7 Navigation performance charts for different truncation parameters

下载: 全尺寸图片幻灯片

图 8 截断效果展示图

Fig. 8 The diagram showing the truncation effect

下载: 全尺寸图片幻灯片

图 9 基于深度强化学习的自主导航技术导航效果展示图

Fig. 9 Deep reinforcement learning-based autonomous navigation technology demonstration map for navigation effect

下载: 全尺寸图片幻灯片

图 10 基于A*算法的自主导航技术导航效果展示图

Fig. 10 A* algorithm-based autonomous navigation technology demonstration map for navigation effect

下载: 全尺寸图片幻灯片

图 11 决策短视问题情况图

Fig. 11 Graph of short-sighted decision-making problems

下载: 全尺寸图片幻灯片

图 12 轨迹震荡问题情况图

Fig. 12 Graph of trajectory oscillation problem

下载: 全尺寸图片幻灯片

表 1 迭代训练后不同尺度输入下智能体导航性能

Table 1 Navigation performance of agents under different scale inputs after iterative training

	成功率(%)	碰撞率(%)	失效率(%)	路径长度(m)
尺度I状态	89.61	0.02	10.37	8.73
尺度II状态	88.25	0.72	11.03	8.79
尺度III状态	0.51	17.66	81.83	11.56
多尺度状态	97.19	0.08	2.74	8.62

下载: 导出CSV

表 2 不同体素环境下的导航性能

Table 2 Navigation performance in different voxel environments

	成功率(%)	碰撞率(%)	失效率(%)	路径长度(m)
训练环境	97.19	0.08	2.74	8.62
测试环境(未采取截断操作)	77.20	6.52	16.29	20.30
测试环境(采取截断操作)	96.93	0.17	2.89	21.90

下载: 导出CSV

表 3 A* 算法与DRL路径规划器导航情况对比 (m)

Table 3 Comparison of A* algorithm with DRL path planner navigation (m)

	任务一	任务二	任务三	任务四
任务直线距离	9.65	10.84	13.85	35.07
A* 算法	11.68	11.97	导航碰撞	导航碰撞
DRL路径规划器	11.59	13.39	18.80	40.26

下载: 导出CSV

参考文献(26)

[1]	陈锦涛, 李鸿一, 任鸿儒, 鲁仁全. 基于RRT森林算法的高层消防多无人机室内协同路径规划. 自动化学报, 2023, 49(12): 2615−2626 Chen Jin-Tao, Li Hong-Yi, Ren Hong-Ru, Lu Ren-Quan. Indoor collaborative path planning of high-rise fire multi-UAV based on RRT forest algorithm. Acta Automatica Sinica, 2023, 49(12): 2615−2626
[2]	郭复胜, 高伟. 基于辅助信息的无人机图像批处理三维重建方法. 自动化学报, 2013, 39(6): 834−845 doi: 10.1016/S1874-1029(13)60056-7 Guo Fu-Sheng, Gao Wei. A 3D reconstruction method for UAV image batch processing based on auxiliary information. Acta Automatica Sinica, 2013, 39(6): 834−845 doi: 10.1016/S1874-1029(13)60056-7
[3]	张广驰, 何梓楠, 崔苗. 基于深度强化学习的无人机辅助移动边缘计算系统能耗优化. 电子与信息学报, 2023, 45(5): 1635−1643 Zhang Guang-Chi, He Zi-Nan, Cui Miao. Energy consumption optimization of UAV-assisted mobile edge computing system based on deep reinforcement learning. Journal of Electronics and Information Technology, 2023, 45(5): 1635−1643
[4]	林韩熙, 向丹, 欧阳剑, 兰晓东. 移动机器人路径规划算法的研究综述. 计算机工程与应用, 2021, 57(18): 38−48 Lin Han-Xi, Xiang Dan, Ouyang Jian, Lan Xiao-Dong. Research review of path planning algorithms for mobile robots. Journal of Computer Engineering & Applications, 2021, 57(18): 38−48
[5]	Tong G, Jiang N, Li B Y, Zhu X, Wang Y, Du W B. UAV navigation in high dynamic environments: A deep reinforcement learning approach. Chinese Journal of Aeronautics, 2021, 34(2): 479−489 doi: 10.1016/j.cja.2020.05.011
[6]	孙辉辉, 胡春鹤, 张军国. 移动机器人运动规划中的深度强化学习方法. 控制与决策, 2021, 36(6): 1281−1292 Sun Hui-Hui, Hu Chun-He, Zhang Jun-Guo. Deep reinforcement learning in mobile robot motion planning. Control and Decision, 2021, 36(6): 1281−1292
[7]	高阳, 陈世福, 陆鑫. 强化学习研究综述. 自动化学报, 2004, 30(1): 86−100 Gao Yang, Chen Shi-Fu, Lu Xin. Review of reinforcement learning research. Acta Automatica Sinica, 2004, 30(1): 86−100
[8]	Yang Y, Li J T, Peng L L. Multi-robot path planning based on a deep reinforcement learning DQN algorithm. CAAI Transactions on Intelligence Technology, 2020, 5(3): 177−183 doi: 10.1049/trit.2020.0024
[9]	赵静, 裴子楠, 姜斌, 陆宁云, 赵斐, 陈树峰. 基于深度强化学习的无人机虚拟管道视觉避障. 自动化学报, 2024, 50(11): 2245−2258 Zhao Jing, Pei Zi-Nan, Jiang Bin, Lu Ning-Yun, Zhao Fei, Chen Shu-Feng. Virtual tube visual obstacle avoidance for UAV based on deep reinforcement learning. Acta Automatica Sinica, 2024, 50(11): 2245−2258
[10]	赵栓峰, 黄涛, 许倩, 耿龙龙. 面向无人机自主飞行的无监督单目视觉深度估计. 激光与光电子学进展, 2020, 57(2): 145−154 Zhao Shuan-Feng, Huang Tao, Xu Qian, Geng Long-Long. Unsupervised monocular visual depth estimation for autonomous flight of unmanned aerial vehicles. Laser & Optoelectronics Progress, 2020, 57(2): 145−154
[11]	刘佳铭. 基于深度卷积神经网络的无人机识别方法研究. 舰船电子工程, 2019, 39(2): 22−26 doi: 10.3969/j.issn.1672-9730.2019.02.007 Liu Jia-Ming. UAV recognition method based on deep convolutional neural network. Naval Electronic Engineering, 2019, 39(2): 22−26 doi: 10.3969/j.issn.1672-9730.2019.02.007
[12]	Bouhamed O, Ghazzai H, Besbes H, Massoud Y. Autonomous UAV navigation: A DDPG-based deep reinforcement learning approach. In: Proceedings of 2020 IEEE International Symposium on Circuits and Systems. Seville, Spain: IEEE, 2020. 1−5
[13]	Luo X Q, Wang Q Y, Gong H F, Tang C. UAV path planning based on the average TD3 algorithm with prioritized experience replay. IEEE Access, 2024, 12: 38017-38029
[14]	Lei H, Nabil A, Song B. Explainable deep reinforcement learning for UAV autonomous path planning. Aerospace Science and Technology, 2021, 118: Article No. 107052
[15]	Xi Z L, Han H R, Zhang Y R, Cheng J. Autonomous navigation of QUAVs under 3D environments based on hierarchical reinforcement learning. In: Proceedings of 42nd Chinese Control Conference. Tianjin, China: IEEE, 2023. 4101−4106
[16]	Ugurlu H I, Pham X H, Kayacan E. Sim-to-real deep reinforcement learning for safe end-to-end planning of aerial robots. Robotics, 2022, 11(5): Article No. 109 doi: 10.3390/robotics11050109
[17]	郭子恒, 蔡晨晓. 基于改进深度强化学习的无人机自主导航方法. 信息与控制, 2023, 52(6): 736−746 Guo Zi-Heng, Cai Chen-Xiao. UAV autonomous navigation method based on improved deep reinforcement learning. Information Control, 2023, 52(6): 736−746
[18]	满恂钰, 刘元盛, 齐含, 严超, 杨茹锦. 未知环境下无人车自主导航探索与地图构建. 汽车技术, 2023(11): 34−40 Man Xun-Yu, Liu Yuan-Sheng, Qi Han, Yan Chao, Yang Ru-Jin. Autonomous navigation exploration and map construction of unmanned vehicles in unknown environment. Automobile Technology, 2023(11): 34−40
[19]	杨永祥, 王念杰, 胡涵川. 分层强化学习在无人机领域应用综述. 人工智能与机器人研究, 2024, 13(1): 66−71 Yang Yong-Xiang, Wang Nian-Jie, Hu Han-Chuan. A review of the application of hierarchical reinforcement learning in the field of drones. Artificial Intelligence and Robotics Research, 2024, 13(1): 66−71
[20]	董校成, 于浩淼, 郭晨. 能耗与时间约束下的UUV三维路径规划. 大连海事大学学报, 2022, 48(2): 11−20 Dong Xiao-Cheng, Yu Hao-Miao, Guo Chen. UUV 3D path planning under energy consumption and time constraints. Journal of Dalian Maritime University, 2022, 48(2): 11−20
[21]	van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. Phoenix, Arizona, USA: AAAI Press, 2016. 2084−2090
[22]	朱少凯, 孟庆浩, 金晟, 戴旭阳. 基于深度强化学习的室内视觉局部路径规划. 智能系统学报, 2022, 17(5): 908−918 doi: 10.11992/tis.202107059 Zhu Shao-Kai, Meng Qing-Hao, Jin Sheng, Dai Xu-Yang. Indoor visual local path planning based on deep reinforcement learning. Journal of Intelligent Systems, 2022, 17(5): 908−918 doi: 10.11992/tis.202107059
[23]	Frantisek D, Andrej B, Martin K, Peter B, Martin F, Tomas F, et al. Path planning with modified a star algorithm for a mobile robot. Procedia Engineering, 2014, 96: 59−69 doi: 10.1016/j.proeng.2014.12.098
[24]	Shiri F M, Perumal T, Mustapha N, Raihani M. A comprehensive overview and comparative analysis on deep learning models: CNN, RNN, LSTM, GRU. arXiv preprint arXiv: 2305.17473, 2023.
[25]	Hutsebaut-Buysse M, Mets K, Latré S. Hierarchical reinforcement learning: A survey and open research challenges. Machine Learning and Knowledge Extraction, 2022, 4(1): 172−221 doi: 10.3390/make4010009
[26]	王雪松, 王荣荣, 程玉虎. 安全强化学习综述. 自动化学报, 2023, 49(9): 1813−1835 Wang Xue-Song, Wang Rong-Rong, Cheng Yu-Hu. Review on security reinforcement learning. Acta Automatica Sinica, 2023, 49(9): 1813−1835