Autonomous Perception-Planning-Control Strategy Based on Deep Reinforcement Learning for Unmanned Aerial Vehicles
-
摘要: 近年来, 随着深度强化学习方法快速发展, 其在无人机自主导航上的应用也受到越来越广泛地关注. 然而, 面对复杂未知的环境, 现存的基于深度强化学习的无人机自主导航算法常受限于对全局信息的依赖和特定训练环境的约束, 极大地限制了其在各种场景当中的应用潜力. 为了解决上述问题, 提出了多尺度输入用于平衡感受野与状态维度, 以及截断操作来使智能体能够在扩张后的环境中运行. 此外, 构建了自主感知-规划-控制架构, 赋予无人机在多样复杂环境中自主导航的能力.Abstract: In recent years, with the rapid development of deep reinforcement learning (DRL) methods, their application in the field of unmanned aerial vehicle (UAV) autonomous navigation has attracted increasing attention. However, when facing complex and unknown environments, existing DRL-based UAV autonomous navigation algorithms are often limited by their dependence on global information and the constraints of specific training environments, greatly limiting their potential for application in various scenarios. To address these issues, multi-scale input is proposed to balance the receptive field and the state dimension, and truncation operation is proposed to enable the agent to operate in the expanded environment. In addition, the autonomous perception-planning-control architecture is constructed to give the UAV the ability to navigate autonomously in diverse and complex environments.
-
表 1 Double DQN算法伪代码
Table 1 Pseudocode for Double DQN algorithm
算法1. 用于无人机路径规划的Double DQN 输入. 环境$\tilde X$, 初始化网络参数α 1: 对于每一轮训练episode = 1, 2,... 执行以下代码: 2: 初始化无人机的出发位置${\tilde p_\text{init}} \sim U({\tilde M_\text{free}})$ 3: 初始化无人机的目标位置${\tilde p_\text{d}} \sim U({\tilde M_\text{free}}/{\tilde p_\text{init}})$ 4: 对于每一刻时间 = 0, 1,... 执行以下代码: 5: 在概率$\varepsilon $下, 均匀随机选择动作at = U(A) 6: 否则, 选择动作at = arg max Q(st, a) 7: 执行动作at, 同时获取下一时刻状态和奖励st + 1, rt 8: 将本次轨迹(st, at, st + 1, rt)放入经验回放缓冲区D 9: 随机均匀采集N条轨迹(st, at, st + 1, rt), 满足1 ≤ i ≤ N 10: 对于每一条轨迹, 使用式(7)的损失函数更新主网络参数α 11: 每C次训练步数使用主网络参数更新目标网络参数$\hat \alpha \leftarrow \alpha $ 表 2 无人机自主导航技术伪代码
Table 2 Pseudocode for Drone Autonomous Navigation Technology
算法2. 无人机自主导航技术 输入. 环境$X$, 网络参数$\alpha $, 无人机初始位置${p_\text{init}}$, 期望位置${p_\text{d}}$ 1: 初始化感知、规划和飞行控制时间${t^\text{sense}}$, ${t^\text{plan}}$和${t^\text{control}}$为零 2: 初始化体素环境空间$\tilde X$为无障碍物空间 3: 判定无人机未到达目标位置时, 即$\parallel p - {p_d}{\parallel _2}$, 执行以下代码: 4: 执行路径规划及轨迹优化模块: 5: 初始化离散路径点$\tilde P = \left[ {{{\tilde p}_0}} \right]$, 初始位置${\tilde p_0} = \rm{floor}(p/2\tilde d)$ 6: 对于每一刻时间$t = 0,\; 1,\; \ldots ,\; x - 1$, 执行以下代码: 7: 利用网络参数$\alpha $选择动作${a_t} = \arg \max Q({s_t},\; a)$ 8: 在体素环境中执行动作${a_t}$ 9: 更新离散路径点$\tilde P.\text{append}({\tilde p_{t + 1}})$ 10: 使用式(13)进行飞行轨迹优化$\Gamma $ 11: 更新飞行控制时间为零, 即${t^\text{control}} = 0$ 12: 更新规划时间${t^\text{plan}} + = \Delta t$ 13: 当判定满足环境感知条件$\text{mod} ({T^\text{sense}},\; {t^\text{sense}}) = = 0$时: 14: 获取新的障碍物环境${\tilde M'_\text{obs}}$ 15: 当判定原感知障碍物环境与规划路径冲突${\tilde M_{{\rm{obs}}}} \cap \tilde P \ne \emptyset $时: 16: 设定规划时间为零, 即${t^\text{plan}} = 0$ 17: 更新障碍物环境${\tilde M_\text{obs}} = {\tilde M_{obs}} \cup {\tilde M'_{obs}}$ 18: 更新感知时间${t^\text{sense}} + = \Delta t$ 19: 当判定满足轨迹优化条件${t^\text{control}} = = 0$时: 20: 重新进行飞行轨迹优化$\Gamma $ 21: 产生PWM信号 22: 更新飞行控制时间${t^\text{control}} + = \Delta t$ 表 3 迭代训练后不同尺度输入下智能体导航性能
Table 3 Navigation performance of agents under different scale inputs after iterative training
成功率(%) 碰撞率(%) 失效率(%) 路径长度(m) 尺度I状态 89.61% 0.02% 10.37% 8.73 尺度II状态 88.25% 0.72% 11.03% 8.79 尺度III状态 0.51% 17.66% 81.83% 11.56 多尺度状态 97.19% 0.08% 2.74% 8.62 表 4 不同体素环境下的导航性能
Table 4 Navigation performance in different voxel environments
成功率(%) 碰撞率(%) 失效率(%) 路径长度(m) 训练环境 97.19% 0.08% 2.74% 8.62 测试环境(未采取截断操作) 77.20% 6.52% 16.29% 20.30 测试环境(采取截断操作) 96.93% 0.17% 2.89% 21.90 表 5 A*算法与DRL路径规划器导航情况对比
Table 5 Comparison of A* algorithm with DRL path planner navigation
任务一 任务二 任务三 任务四 任务直线距离 9.65米 10.84米 13.85米 35.07米 A*算法 11.68米 11.97米 导航碰撞 导航碰撞 DRL路径规划器 11.59米 13.39米 18.80米 40.26米 -
[1] 陈锦涛, 李鸿一, 任鸿儒, 鲁仁全. 基于RRT森林算法的高层消防多无人机室内协同路径规划. 自动化学报, 2023, 49(12): 2615−2626Chen Jin-Tao, Li Hong-Yi, Ren Hong-Ru, Lu Ren-Quan. Indoor collaborative path planning of high-rise fire multi-UAV based on RRT forest algorithm. Acta Automatica Sinica, 2023, 49(12): 2615−2626 [2] 郭复胜, 高伟. 基于辅助信息的无人机图像批处理三维重建方法. 自动化学报, 2013, 39(6): 834−845 doi: 10.1016/S1874-1029(13)60056-7Guo Fu-Sheng, Gao Wei. A 3D reconstruction method for UAV image batch processing based on auxiliary information. Acta Automatica Sinica, 2013, 39(6): 834−845 doi: 10.1016/S1874-1029(13)60056-7 [3] 张广驰, 何梓楠, 崔苗. 基于深度强化学习的无人机辅助移动边缘计算系统能耗优化. 电子与信息学报, 2023, 45(5): 1635−1643Zhang Guang-Chi, HE Zi-Nan, Cui Miao. Energy consumption optimization of UAV-assisted mobile edge computing system based on deep reinforcement learning. Journal of Electronics and Information Technology, 2023, 45(5): 1635−1643 [4] 林韩熙, 向丹, 欧阳剑, 兰晓东. 移动机器人路径规划算法的研究综述. Journal of Computer Engineering & Applications, 2021, 57(18): 38Lin Han-Xi, Xiang Dan, OuYang Jian, Lan Xiao-Dong. Research review of path planning algorithms for mobile robots. Journal of Computer Engineering & Applications, 2021, 57(18): 38 [5] Tong G, Jiang N, Li B Y, Zhu X, Wang Y, Du W B. UAV navigation in high dynamic environments: A deep reinforcement learning approach. Chinese Journal of Aeronautics, 2021, 34(2): 479−489 doi: 10.1016/j.cja.2020.05.011 [6] 孙辉辉, 胡春鹤, 张军国. 移动机器人运动规划中的深度强化学习方法. 控制与决策, 2021, 36(6): 1281−1292Sun Hui-Hui, Hu Chun-He, Zhang Jun-Guo. Deep reinforcement learning in mobile robot motion planning. Control and Decision, 2021, 36(6): 1281−1292 [7] 高阳, 陈世福, 陆鑫. 强化学习研究综述. 自动化学报, 2004, 30(1): 86−100Gao Yang, Chen Shi-Fu, Lu Xin. Review of reinforcement learning research. Acta Automatica Sinica, 2004, 30(1): 86−100 [8] Yang Y, Li J T, Peng L L. Multi-robot path planning based on a deep reinforcement learning dqn algorithm. CAAI Transactions on Intelligence Technology, 2020, 5(3): 177−183 doi: 10.1049/trit.2020.0024 [9] 赵静, 裴子楠, 姜斌, 陆宁云, 赵斐, 陈树峰. 基于深度强化学习的无人机虚拟管道视觉避障. 自动化学报, 2024, 50(11): 1−14Zhao Jing, Pei Zi-Nan, Jiang Bin, Lu Ning-Yun, Zhao Fei, Chen Shu-Feng. Virtual tube visual obstacle avoidance for UAV based on deep reinforcement learning. Acta Automatica Sinica, 2024, 50(11): 1−14 [10] 赵栓峰, 黄涛, 许倩, 耿龙龙. 面向无人机自主飞行的无监督单目视觉深度估计. Laser & Optoelectronics Progress, 2020, 57(2): 21012−1Zhao Shuan-Feng, Huang Tao, Xu Qian, Geng Long-Long. Unsupervised monocular visual depth estimation for autonomous flight of unmanned aerial vehicles. Laser & Optoelectronics Progress, 2020, 57(2): 21012−1 [11] 刘佳铭. 基于深度卷积神经网络的无人机识别方法研究. 舰船电子工程, 2019, 39(2): 22−26 doi: 10.3969/j.issn.1672-9730.2019.02.007Liu Jia-Ming. Uav recognition method based on deep Convolutional neural network. Naval Electronic Engineering, 2019, 39(2): 22−26 doi: 10.3969/j.issn.1672-9730.2019.02.007 [12] Bouhamed O, Ghazzai H, Besbes H, Massoud Y. Autonomous UAV navigation: A DDPG-based deep reinforcement learning approach In: Proceedings of 2020 IEEE International Symposium on Circuits and Systems, Seville, Spain: IEEE, 2020.1−5 [13] Luo X Q, Wang Q Y, Gong H F, Tang C. UAV path planning based on the average TD3 algorithm with prioritized experience replay.IEEE Access, DOI: 10.1109/ACCESS.2024.3375083 [14] Lei H, Nabil A, Song B. Explainable deep reinforcement learning for UAV autonomous path planning. Aerospace science and technology, DOI: 10.1016/j.ast.2021.107052. [15] Xi Z L, Han H R, Zhang Y R, Cheng J. Autonomous navigation of QUAVs under 3D environments based on hierarchical reinforcement learning. In: Proceedings of 2023 42nd Chinese Control Conference. IEEE, Tianjin, China, 2023. 4101−4106 [16] Ugurlu H I, Pham X H, Kayacan E. Sim-to-real deep reinforcement learning for safe end-to-end planning of aerial robots. Robotics, 2022, 11(5): 109 doi: 10.3390/robotics11050109 [17] 郭子恒, 蔡晨晓. 基于改进深度强化学习的无人机自主导航方法. Information & Control, 2023, 52(6): 736−746Guo Zi-Heng, Cai Chen-Xiao. Uav autonomous navigation method based on improved deep reinforcement learning. Information Control, 2023, 52(6): 736−746 [18] 满恂钰, 刘元盛, 齐含, 严超, 杨茹锦. 未知环境下无人车自主导航探索与地图构建. Automobile Technology, 2023(11): 34−40Man Xun-Yu, Liu Yuan-Sheng, Qi Han, Yan Chao, Yang Ru-Jin. Autonomous navigation exploration and map construction of unmanned vehicles in unknown environment. Automobile Technology, 2023(11): 34−40 [19] 董校成, 于浩淼, 郭晨. 能耗与时间约束下的uuv三维路径规划. 大连海事大学学报, 2022, 48(2): 11−20Dong Xiao-Cheng, YU Hao-Miao, Guo Chen. Uuv 3D path planning under energy consumption and time constraints. Journal of Dalian Maritime University, 2022, 48(2): 11−20 [20] Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. Phoenix, Arizona, USA: AAAI Press, 2016. 2084-2090 [21] 朱少凯, 孟庆浩, 金晟, 戴旭阳. 基于深度强化学习的室内视觉局部路径规划. 智能系统学报, 2022, 17(5): 908−918 doi: 10.11992/tis.202107059Zhu Shao-kai, Meng Qing-Hao, Jin Sheng, Dai Xu-yang. Indoor visual local path planning based on deep reinforcement learning. Journal of Intelligent Systems, 2022, 17(5): 908−918 doi: 10.11992/tis.202107059 [22] Frantisek D, Andrej B, Martin K, Peter B, Martin F, Tomas F, et al. Path planning with modified a star algorithm for a mobile robot. Procedia engineering, 2014, 96: 59−69 doi: 10.1016/j.proeng.2014.12.098 [23] Shiri F M, Perumal T, Mustapha N, Raihani M. A comprehensive overview and comparative analysis on deep learning models: CNN, RNN, LSTM, GRU. arXiv preprint arXiv: 2305.17473, 2023 [24] Hutsebaut-Buysse M, Mets K, Latré S. Hierarchical reinforcement learning: A survey and open research challenges. Machine Learning and Knowledge Extraction, 2022, 4(1): 172−221 doi: 10.3390/make4010009 [25] 王雪松, 王荣荣, 程玉虎. 安全强化学习综述. 自动化学报, 2023, 49(9): 1813−1835Wang Xue-Song, Wang Rong-Rong, Cheng Yu-Hu. Review on security reinforcement learning. Acta Automatica Sinica, 2023, 49(9): 1813−1835 -
计量
- 文章访问数: 90
- HTML全文浏览量: 37
- 被引次数: 0