基于深度强化学习的无人机虚拟管道视觉避障

赵静; 裴子楠; 姜斌; 陆宁云; 赵斐; 陈树峰

doi:10.16383/j.aas.c230728

基于深度强化学习的无人机虚拟管道视觉避障

doi: 10.16383/j.aas.c230728 cstr: 32138.14.j.aas.c230728

赵静^1,,
裴子楠^1,,
姜斌^2,,
陆宁云^2,,
赵斐^3,,
陈树峰^4,

1.
南京邮电大学自动化学院、人工智能学院南京 210023
2.
南京航空航天大学自动化学院南京 210016
3.
浙江大学控制科学与工程学院杭州 310058
4.
北京计算机技术及应用研究所北京 100854

基金项目: 直升机动力学全国重点实验室 (2024-ZSJ-LB-02-05), 机械结构力学及控制国家重点实验室 (MCMS-E-0123G04), 工业控制技术全国重点实验室 (ICT2023B21), 南京邮电大学校级自然科学基金 (NY223119)资助

详细信息

作者简介:
赵静：南京邮电大学自动化学院、人工智能学院副教授. 主要研究方向为空中机器人和无人系统感知与控制. E-mail: zhaojing@njupt.edu.cn

裴子楠：南京邮电大学自动化学院、人工智能学院硕士研究生. 主要研究方向为无人机轨迹规划和深度强化学习. E-mail: njpzn1@126.com

姜斌：南京航空航天大学自动化学院教授. 主要研究方向为故障诊断与容错控制及应用. 本文通信作者.E-mail: binjiang@nuaa.edu.cn

陆宁云：南京航空航天大学自动化学院教授. 主要研究方向为基于数据驱动的故障诊断与预测及其应用.E-mail: luningyun@nuaa.edu.cn

赵斐：浙江大学控制科学与工程学院副研究员. 主要研究方向为过程系统工程.E-mail: zhaofeizju@zju.edu.cn

陈树峰：北京计算机技术及应用研究所高级工程师. 主要研究方向为嵌入式操作系统和嵌入式智能计算.E-mail: csfcsf1991@sina.com

计量
- 文章访问数: 1018
- HTML全文浏览量: 696
- PDF下载量: 322
- 被引次数: 0
出版历程
- 收稿日期: 2023-11-22
- 录用日期: 2024-05-12
- 网络出版日期: 2024-06-25
- 刊出日期: 2024-11-26

Virtual Tube Visual Obstacle Avoidance for UAV Based on Deep Reinforcement Learning

1.
College of Automation and College of Artificial Intelligence, Nanjing University of Posts and Telecommunications, Nanjing 210023
2.
College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016
3.
College of Control Science and Engineering, Zhejiang University, Hanghzou 310058
4.
Beijing Institute of Computer Technology and Application, Beijing 100854

Funds: Supported by National Key Laboratory Foundation of Helicopter Aeromechanics (2024-ZSJ-LB-02-05), State Key Laboratory of Aerospace Structural Mechanics and Control (MCMS-E-0123G04), Open Research Project of the State Key Laboratory of Industrial Control Technology (ICT2023B21), and Natural Science Foundation of Nanjing University of Posts and Telecommunications (NY223119)

More Information

Author Bio:
ZHAO Jing　Associate professor at the College of Automation and College of Artificial intelligence, Nanjing University of Posts and Telecommunications. Her research interest covers aerial robotics and unmanned system perception and control

PEI Zi-Nan　Master student at the College of Automation and College of Artificial Intelligence, Nanjing University of Posts and Telecommunications. His research interest covers UAV path planning and deep reinforcement learning

JIANG Bin　Professor at the College of Automation Engineering, Nanjing University of Aeronautics and Astronautics. His research interest covers fault diagnosis and fault-tolerant control and their applications. Corresponding author of this paper

LU Ning-Yun　Professor at the College of Automation Engineering, Nanjing University of Aeronautics and Astronautics. Her research interest covers data driven fault diagnosis and prognosis and their applications

ZHAO Fei　Associate research fellow at the College of Control Science and Engineering, Zhejiang University. His main research interest is process system engineering

CHEN Shu-Feng　Senior engineer at the Beijing Institute of Computer Technology and Application. His research interest covers embedded operating system and embedded intelligent computing

摘要

摘要: 针对虚拟管道下的无人机 (Unmanned aerial vehicle, UAV)自主避障问题, 提出一种基于视觉传感器的自主学习架构. 通过引入新颖的奖励函数, 设计了一种端到端的深度强化学习(Deep reinforcement learning, DRL)控制策略. 融合卷积神经网络 (Convolutional neural network, CNN)和循环神经网络 (Recurrent neural network, RNN)的优点构建双网络, 降低了网络复杂度, 对无人机深度图像进行有效处理. 进一步通过AirSim模拟器搭建三维实验环境, 采用连续动作空间优化无人机飞行轨迹的平滑性. 仿真结果表明, 与现有的方法对比, 该模型在面对静态和动态障碍时, 训练收敛速度快, 平均奖励高, 任务完成率分别增加9.4%和19.98%, 有效实现无人机的精细化避障和自主安全导航.
- 自主避障 /
- 深度强化学习 /
- 虚拟管道 /
- 无人机
Abstract: In order to solve the problem of autonomous obstacle avoidance of unmanned aerial vehicle (UAV) under virtual tube, this paper proposes an autonomous learning architecture based on visual sensors, in which a novel reward function is introduced, and an end-to-end deep reinforcement learning (DRL) control strategy is designed. By integrating convolutional neural network (CNN) and recurrent neural network (RNN), a dual-network architecture is constructed, reducing network complexity and enabling effective processing of UAV depth images. Furthermore, using the AirSim simulator, a three-dimensional experimental environment is created to optimize the smoothness of UAV flight trajectories in a continuous action space. Compared with the existing methods when confronting both static and dynamic obstacles, the simulation results indicate that this model achieves faster training convergence and higher average rewards. The task completion rates in the two scenarios are also increased by 9.4% and 19.98%, respectively, which can effectively achieve precise obstacle avoidance and autonomous safe navigation of drones.
- Autonomous obstacle avoidance /
- deep reinforcement learning (DRL) /
- virtual tube /
- unmanned aerial vehicle (UAV)

HTML全文

图 1 DRL基本原理

Fig. 1 Basic principle of DRL

下载: 全尺寸图片幻灯片

图 2 无人机连续动作空间示意图

Fig. 2 Schematic diagram of unmanned aerial vehicle continuous action space

下载: 全尺寸图片幻灯片

图 3 RCPPO算法架构图

Fig. 3 RCPPO algorithm architecture diagram

下载: 全尺寸图片幻灯片

图 4 LSTM网络结构图

Fig. 4 LSTM structure

下载: 全尺寸图片幻灯片

图 5 双网络结构图

Fig. 5 Dual network structure diagram

下载: 全尺寸图片幻灯片

图 6 实验环境

Fig. 6 Experiment environment

下载: 全尺寸图片幻灯片

图 7 无障碍环境中的平均奖励值

Fig. 7 Average reward values in obstacle-free environment

下载: 全尺寸图片幻灯片

图 8 CPPO-1无障碍轨迹图

Fig. 8 Obstacle-free trajectory map of CPPO-1

下载: 全尺寸图片幻灯片

图 9 CPPO-2无障碍轨迹图

Fig. 9 Obstacle-free trajectory map of CPPO-2

下载: 全尺寸图片幻灯片

图 10 静态障碍环境中的平均奖励值

Fig. 10 Average reward values in static obstacle environment

下载: 全尺寸图片幻灯片

图 11 CPPO-1静态障碍轨迹图

Fig. 11 Static obstacle trajectory map of CPPO-1

下载: 全尺寸图片幻灯片

图 12 CPPO-2静态障碍轨迹图

Fig. 12 Static obstacle trajectory map of CPPO-2

下载: 全尺寸图片幻灯片

图 13 动态障碍环境中的平均奖励值

Fig. 13 Average reward values in dynamic obstacle environment

下载: 全尺寸图片幻灯片

图 14 CPPO-1动态障碍轨迹图

Fig. 14 Dynamic obstacle trajectory map of CPPO-1

下载: 全尺寸图片幻灯片

图 15 RCPPO动态障碍轨迹图

Fig. 15 Dynamic obstacle trajectory map of RCPPO

下载: 全尺寸图片幻灯片

表 1 CNN结构

Table 1 CNN structure

网络层	输入维度	卷积核尺寸	卷积核个数	步长	激活函数	输出维度
CNN1	84 × 84 × 1	8 × 8	32	4	ReLU	20 × 20 × 32
MaxPooling1	20 × 20 × 32	2 × 2	—	2	—	10 × 10 × 32
CNN2	10 × 10 × 32	3 × 3	64	1	ReLU	8 × 8 × 64
MaxPooling2	8 × 8 × 64	2 × 2	—	2	—	4 × 4 × 64

下载: 导出CSV

表 2 参数设定

Table 2 Parameter settings

参数	取值
学习率	0.0001
优化器	Adam
折扣因子	0.99
剪切值	0.2
批量大小	128
熵权重	0.02
GAE权重	0.95

下载: 导出CSV

表 3 无障碍环境中的测试成功率

Table 3 Test success rate in obstacle-free environment

算法类型	平均得分	得分标准差	成功率(%)
CPPO-1	21.31	7.29	97.00
CPPO-1 (高噪声)	20.71	8.98	96.67
CPPO-2	22.65	0.21	100.00
CPPO-2 (高噪声)	22.64	0.21	100.00

下载: 导出CSV

表 4 静态障碍环境中的测试成功率

Table 4 Test success rate in static obstacle environment

算法类型	平均得分	得分标准差	成功率(%)
CPPO-1	13.96	17.09	81.08
CPPO-1 (高噪声)	12.53	18.16	78.60
CPPO-2	20.26	9.32	90.52
CPPO-2 (高噪声)	17.84	13.76	88.93

下载: 导出CSV

表 5 动态障碍环境中的测试成功率

Table 5 Test success rate in dynamic obstacle environment

算法类型	平均得分	得分标准差	成功率(%)
CPPO-1	7.52	19.70	65.34
RCPPO-N	12.34	17.34	78.47
RCPPO-N (高动态)	11.02	18.06	74.73
RCPPO	15.61	14.56	85.32
RCPPO (高动态)	15.02	16.02	82.63

下载: 导出CSV

表 6 RCPPO泛化性测试成功率

Table 6 Test success rate of RCPPO generalization

仿真环境	平均得分	得分标准差	成功率(%)
无障碍	22.61	0.28	100.00
静态障碍	17.70	13.07	89.36
动态障碍	15.61	14.56	85.32

下载: 导出CSV

参考文献(28)

[1]	Zhou T, Chen M, Zou J. Reinforcement learning based data fusion method for multi-sensors. IEEE/CAA Journal of Automatica Sinica, 2020, 7(6): 1489−1497 doi: 10.1109/JAS.2020.1003180
[2]	Yasin J N, Mohamed S A S, Haghbayan M H, Heikkonen J, Tenhunen H, Plosila J. Low-cost ultrasonic based object detection and collision avoidance method for autonomous robots. International Journal of Information Technology, 2021, 13: 97−107 doi: 10.1007/s41870-020-00513-w
[3]	Ravankar A, Ravankar A A, Rawankar A, Hoshino Y. Autonomous and safe navigation of mobile robots in vineyard with smooth collision avoidance. Agriculture, 2021, 11(10): 954−970 doi: 10.3390/agriculture11100954
[4]	Fan J, Lei L, Cai S, Shen G, Cao P, Zhang L. Area surveillance with low detection probability using UAV swarms. IEEE Transactions on Vehicular Technology, 2024, 73(2): 1736−1752 doi: 10.1109/TVT.2023.3318641
[5]	Mao P, Quan Q. Making robotics swarm flow more smoothly: A regular virtual tube model. In: Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems. Kyoto, Japan: IEEE, 2022. 4498−4504
[6]	Lv S, Gao Y, Che J, Quan Q. Autonomous drone racing: Time-optimal spatial iterative learning control within a virtual tube. In: Proceedings of the 2023 IEEE International Conference on Robotics and Automation. London, United Kingdom: IEEE, 2023. 3197−3203
[7]	岳敬轩, 王红茹, 朱东琴, ALEKSANDR Chupalov. 基于改进粒子滤波的无人机编队协同导航算法. 航空学报, 2023, 44(14): 251−262 Yue Jing-Xuan, Wang Hong-Ru, Zhu Dong-Qin, Aleksandr Chupalov. UAV formation cooperative navigation algorithm based on improved particle filter. Acta Aeronautica et Astronautica Sinica, 2023, 44(14): 251−262
[8]	吴健发, 王宏伦, 王延祥, 刘一恒. 无人机反应式扰动流体路径规划. 自动化学报, 2023, 49(2): 272−287 Wu Jian-Fa, Wang Hong-Lun, Wang Yan-Xiang, Liu Yi-Heng. UAV reactive interfered fluid path planning. Acta Automatica Sinica, 2023, 49(2): 272−287
[9]	Opromolla R, Fasano G. Visual-based obstacle detection and tracking, and conflict detection for small UAS sense and avoid. Aerospace Science and Technology, 2021, 119: 107167−107186 doi: 10.1016/j.ast.2021.107167
[10]	Yao P, Sui X, Liu Y, Zhao Z. Vision-based environment perception and autonomous obstacle avoidance for unmanned underwater vehicle. Applied Ocean Research, 2023, 134: 103510−103527 doi: 10.1016/j.apor.2023.103510
[11]	Xu Z, Xiu Y, Zhan X, Chen B, Shimada K. Vision-aided UAV navigation and dynamic obstacle avoidance using gradient-based b-spline trajectory optimization. In: Proceedings of the 2023 IEEE International Conference on Robotics and Automation. London, United Kingdom: IEEE, 2023. 1214−1220
[12]	Rezaei N, Darabi S. Mobile robot monocular vision-based obstacle avoidance algorithm using a deep neural network. Evolutionary Intelligence, 2023, 16(6): 1999−2014 doi: 10.1007/s12065-023-00829-z
[13]	Abd Elaziz M, Dahou A, Abualigah L, Yu L, Alshinwan M, Khasawneh A M, et al. Advanced metaheuristic optimization techniques in applications of deep neural networks: A review. Neural Computing and Applications, 2021, 33(21): 14079−14099 doi: 10.1007/s00521-021-05960-5
[14]	Roghair J, Niaraki A, Ko K, Jannesari A. A vision based deep reinforcement learning algorithm for UAV obstacle avoidance. In: Proceedings of the 2021 Intelligent Systems Conference. Cham, Switzerland: Springer, 2022. 115−128
[15]	周治国, 余思雨, 于家宝, 段俊伟, 陈龙, 陈俊龙. 面向无人艇的T-DQN智能避障算法研究. 自动化学报, 2023, 49(8): 1645−1655 Zhou Zhi-Guo, Yu Si-Yu, Yu Jia-Bao, Duan Jun-Wei, Chen Long, Chen Jun-Long. Research on T-DQN intelligent obstacle avoidance algorithm of unmanned surface vehicle. Acta Automatica Sinica, 2023, 49(8): 1645−1655
[16]	Kalidas A P, Joshua C J, Md A Q, Basheer S, Mohan S, Sakri S. Deep reinforcement learning for vision-based navigation of UAVs in avoiding stationary and mobile obstacles. Drones, 2023, 7(4): 245−267 doi: 10.3390/drones7040245
[17]	Liang C, Liu L, Liu C. Multi-UAV autonomous collision avoidance based on PPO-GIC algorithm with CNN-LSTM fusion network. Neural Networks, 2023, 162: 21−33 doi: 10.1016/j.neunet.2023.02.027
[18]	Zhao X, Yang R, Zhang Y, Yan M, Yue L. Deep reinforcement learning for intelligent dual-UAV reconnaissance mission planning. Electronics, 2022, 11(13): 2031−2048 doi: 10.3390/electronics11132031
[19]	施伟, 冯旸赫, 程光权, 黄红蓝, 黄金才, 刘忠, 等. 基于深度强化学习的多机协同空战方法研究. 自动化学报, 2021, 47(7): 1610−1623 Shi Wei, Feng Yang-He, Cheng Guang-Quan, Huang Hong-Lan, Huang Jin-Cai, Liu Zhong, et al. Research on multi-aircraft cooperative air combat method based on deep reinforcement learning. Acta Automatica Sinica, 2021, 47(7): 1610−1623
[20]	Kurniawati H. Partially observable Markov decision processes and robotics. Annual Review of Control, Robotics, and Autonomous Systems, 2022, 5: 253−277 doi: 10.1146/annurev-control-042920-092451
[21]	Fang W, Chen Y, Xue Q. Survey on research of RNN-based spatio-temporal sequence prediction algorithms. Journal on Big Data, 2021, 3(3): 97−110 doi: 10.32604/jbd.2021.016993
[22]	Schulman J, Moritz P, Levine S, Jordan M, Abbeel P. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv: 1506.02438, 2015.
[23]	Zhang X, Zheng K, Wang C, Chen J, Qi H. A novel deep reinforcement learning for POMDP-based autonomous ship collision decision-making. Neural Computing and Applications, 2023: 1−15
[24]	姚鹏, 解则晓. 基于修正导航向量场的AUV自主避障方法. 自动化学报, 2020, 46(8): 1670−1680 Yao Peng, Xie Ze-Xiao. Autonomous obstacle avoidance for AUV based on modified guidance vector field. Acta Automatica Sinica, 2020, 46(8): 1670−1680
[25]	Khetarpal K, Riemer M, Rish I, Precup D. Towards continual reinforcement learning: A review and perspectives. Journal of Artificial Intelligence Research, 2022, 75: 1401−1476 doi: 10.1613/jair.1.13673
[26]	Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv preprint arXiv: 1707.06347, 2017.
[27]	Hou Y, Liu L, Wei Q, Xu X, Chen C. A novel DDPG method with prioritized experience replay. In: Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics. Banff, Canada: IEEE, 2017. 316−321
[28]	Shah S, Dey D, Lovett C, Kapoor A. AirSim: High-fidelity visual and physical simulation for autonomous vehicles. In: Proceedings of the Field and Service Robotics: Results of the 11th International Conference. Zurich, Switzerland: Springer, 2018. 621−635