2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

面向无人艇的T-DQN智能避障算法研究

周治国 余思雨 于家宝 段俊伟 陈龙 陈俊龙

周治国, 余思雨, 于家宝, 段俊伟, 陈龙, 陈俊龙. 面向无人艇的T-DQN智能避障算法研究. 自动化学报, 2023, 49(8): 1645−1655 doi: 10.16383/j.aas.c210080
引用本文: 周治国, 余思雨, 于家宝, 段俊伟, 陈龙, 陈俊龙. 面向无人艇的T-DQN智能避障算法研究. 自动化学报, 2023, 49(8): 1645−1655 doi: 10.16383/j.aas.c210080
Zhou Zhi-Guo, Yu Si-Yu, Yu Jia-Bao, Duan Jun-Wei, Chen Long, Chen Jun-Long. Research on T-DQN intelligent obstacle avoidance algorithm of unmanned surface vehicle. Acta Automatica Sinica, 2023, 49(8): 1645−1655 doi: 10.16383/j.aas.c210080
Citation: Zhou Zhi-Guo, Yu Si-Yu, Yu Jia-Bao, Duan Jun-Wei, Chen Long, Chen Jun-Long. Research on T-DQN intelligent obstacle avoidance algorithm of unmanned surface vehicle. Acta Automatica Sinica, 2023, 49(8): 1645−1655 doi: 10.16383/j.aas.c210080

面向无人艇的T-DQN智能避障算法研究

doi: 10.16383/j.aas.c210080
基金项目: “十三五” 装备预研领域基金(61403120109), 暨南大学中央高校基本科研业务费专项资金(21619412)资助
详细信息
    作者简介:

    周治国:北京理工大学信息与电子学院副教授. 主要研究方向为智能无人系统, 信息感知与导航和机器学习. 本文通信作者. E-mail: zhiguozhou@bit.edu.cn

    余思雨:北京理工大学信息与电子学院硕士研究生. 主要研究方向为智能无人系统信息感知与导航. E-mail: yusiyu3408@163.com

    于家宝:北京理工大学信息与电子学院硕士研究生. 主要研究方向为智能无人系统信息感知与导航. E-mail: 3120200722@bit.edu.cn

    段俊伟:暨南大学信息科学技术学院讲师. 主要研究方向为图像融合, 机器学习和计算智能. E-mail: jwduan@jnu.edu.cn

    陈龙:澳门大学科技学院副教授. 主要研究方向为计算智能, 贝叶斯方法和机器学习. E-mail: longchen@um.edu.mo

    陈俊龙:华南理工大学计算机科学与工程学院教授. 主要研究方向为控制论, 智能系统和计算智能. E-mail: philipchen@scut.edu.cn

Research on T-DQN Intelligent Obstacle Avoidance Algorithm of Unmanned Surface Vehicle

Funds: Supported by Equipment Pre-research Field Fund Thirteen Five-year (61403120109) and Fundamental Research Funds for the Central Universities of Jinan University (21619412)
More Information
    Author Bio:

    ZHOU Zhi-Guo Associate professor at the School of Information and Electronics, Beijing Institute of Technology. His research interest covers intelligent unmanned systems, information perception and navigation, and machine learning. Corresponding author of this paper

    YU Si-Yu Master student at the School of Information and Electronics, Beijing Institute of Technology. Her main research interest is information perception and navigation of intelligent unmanned systems

    YU Jia-Bao Master student at the School of Information and Electronics, Beijing Institute of Technology. Her main research interest is information perception and navigation of intelligent unmanned systems

    DUAN Jun-Wei Lecturer at the College of Information Science and Technology, Jinan University. His research interest covers image fusion, machine learning, and computational intelligence

    CHEN Long Associate professor at the Faculty of Science and Technology, University of Macau. His research interest covers computational intelligence, Bayesian methods, and machine learning

    CHEN Jun-Long Professor at the School of Computer Science and Engineering, South China University of Technology. His research interest covers cybernetics, intelligent systems, and computational intelligence

  • 摘要: 无人艇(Unmanned surface vehicle, USV)作为一种具有广泛应用前景的无人系统, 其自主决策能力尤为关键. 由于水面运动环境较为开阔, 传统避障决策算法难以在量化规则下自主规划最优路线, 而一般强化学习方法在大范围复杂环境下难以快速收敛. 针对这些问题, 提出一种基于阈值的深度Q网络避障算法(Threshold deep Q network, T-DQN), 在深度Q网络(Deep Q network, DQN)基础上增加长短期记忆网络(Long short-term memory, LSTM)来保存训练信息, 并设定经验回放池阈值加速算法的收敛. 通过在不同尺度的栅格环境中进行实验仿真, 实验结果表明, T-DQN算法能快速地收敛到最优路径, 其整体收敛步数相比Q-learning算法和DQN算法, 分别减少69.1%和24.8%, 引入的阈值筛选机制使整体收敛步数降低41.1%. 在Unity 3D强化学习仿真平台, 验证了复杂地图场景下的避障任务完成情况, 实验结果表明, 该算法能实现无人艇的精细化避障和智能安全行驶.
  • 图  1  T-DQN算法架构图

    Fig.  1  T-DQN algorithm architecture

    图  2  LSTM网络结构图

    Fig.  2  LSTM network structure

    图  3  加入LSTM后的网络层结构

    Fig.  3  Network layer structure adding LSTM

    图  4  无人艇路径规划流程图

    Fig.  4  Flow chart of USV path planning

    图  5  无人艇实际参数

    Fig.  5  Actual parameters of USV

    图  6  10 × 10栅格地图下采用T-DQN训练后的路径结果

    Fig.  6  Path results after T-DQN training under 10 × 10 grid map

    图  7  20 × 20栅格地图下采用T-DQN训练后的路径结果

    Fig.  7  Path results after T-DQN training under 20 × 20 grid map

    图  8  30 × 30栅格地图下采用T-DQN训练后的路径结果

    Fig.  8  Path results after T-DQN training under 30 × 30 grid map

    图  9  4种算法分别在10 × 10、20 × 20、30 × 30栅格地图下的平均回报值对比

    Fig.  9  Comparison of the average return values of 4 algorithms under 10 × 10, 20 × 20, 30 × 30 grid maps

    图  10  Spaitlab-unity仿真实验平台

    Fig.  10  Spaitlab-unity simulation experiment platform

    图  11  无人艇全局路径规划仿真运动轨迹

    Fig.  11  Global path planning simulation trajectory of USV

    图  12  栅格化水域空间内的全局路径规划

    Fig.  12  Global path planning in grid water space

    图  13  无人艇全局/局部路径规划仿真运动轨迹对比

    Fig.  13  Comparison of global/local simulation trajectories of USV

    表  1  4种算法收敛步数对比

    Table  1  Comparison of convergence steps of 4 algorithms

    算法10 × 10
    栅格地图
    20 × 20
    栅格地图
    30 × 30
    栅格地图
    Q-learning888> 2000> 2000
    DQN317600> 2000
    LSTM + DQN750705850
    T-DQN400442517
    下载: 导出CSV
  • [1] Tang PP, Zhang RB, Liu DL, Huang LH, Liu GQ, Deng TQ. Local reactive obstacle avoidance approach for high-speed unmanned surface vehicle. Ocean Engineering, 2015, 106: 128-140 doi: 10.1016/j.oceaneng.2015.06.055
    [2] Campbell S, Naeem W, Irwin G W. A review on improving the autonomy of unmanned surface vehicles through intelligent collision avoidance manoeuvres. Annual Reviews in Control, 2012, 36(2): 267-283 doi: 10.1016/j.arcontrol.2012.09.008
    [3] Liu ZX, Zhang YM, Yu X, Yuan C. Unmanned surface vehicles: An overview of developments and challenges. Annual Review in Control, 2016, 41: 71-93 doi: 10.1016/j.arcontrol.2016.04.018
    [4] 张卫东, 刘笑成, 韩鹏. 水上无人系统研究进展及其面临的挑战. 自动化学报, 2020, 46(5): 847−857

    Zhang Wei-Dong, Liu Xiao-Cheng, Han Peng. Progress and challenges of overwater unmanned systems. Acta Automatica Sinica, 2020, 46(5): 847−857
    [5] 范云生, 柳健, 王国峰, 孙宇彤. 基于异源信息融合的无人水面艇动态路径规划. 大连海事大学学报, 2018, 44(1): 9-16

    Fan Yun-Sheng, Liu Jian, Wang Guo-Feng, Sun Yu-Tong. Dynamic path planning for unmanned surface vehicle based on heterologous information fusion. Journal of Dalian Maritime University, 2018, 44(1): 9-16
    [6] Zhan WQ, Xiao CS, Wen YQ, Zhou CH, Yuan HW, Xiu SP, Zhang YM, Zou X, Liu X, Li QL. Autonomous visual perception for unmanned surface vehicle navigation in an unknown environment. Sensors, 2019, 19(10): 2216 doi: 10.3390/s19102216
    [7] Zhou CH, Gu SD, Wen YQ, Du Z, Xiao CS, Huang L, Zhu M. The review unmanned surface vehicle path planning: Based on multi-modality constraint. Ocean Engineering, 2020, 200: 107043 doi: 10.1016/j.oceaneng.2020.107043
    [8] Yang X, Cheng W. AGV path planning based on smoothing A* algorithm. International Journal of Software Engineering and Applications, 2015, 6(5): 1-8 doi: 10.5121/ijsea.2015.6501
    [9] Lozano-Pérez T, Wesley M A. An algorithm for planning collision-free paths among polyhedral obstacles. Communications of the ACM, 1979, 22(10): 560-570 doi: 10.1145/359156.359164
    [10] 姚鹏, 解则晓. 基于修正导航向量场的AUV自主避障方法. 自动化学报, 2020, 46(08): 1670-1680

    Yao Peng, Xie Ze-Xiao. Autonomous obstacle avoidance for AUV based on modified guidance vector field. Acta Automatica Sinica, 2020, 46(08): 1670-1680
    [11] 董瑶, 葛莹莹, 郭鸿湧, 董永峰, 杨琛. 基于深度强化学习的移动机器人路径规划. 计算机工程与应用, 2019, 55(13): 15-19+157 doi: 10.3778/j.issn.1002-8331.1812-0321

    Dong Yao, Ge Yingying, Guo Hong-Yong, Dong Yong-Feng, Yang Chen. Path planning for mobile robot based on deep reinforcement learning. Computer Engineering and Applications, 2019, 55(13): 15-19+157 doi: 10.3778/j.issn.1002-8331.1812-0321
    [12] 吴晓光, 刘绍维, 杨磊, 邓文强, 贾哲恒. 基于深度强化学习的双足机器人斜坡步态控制方法. 自动化学报, 2020, 46(x): 1−12

    Wu Xiao-Guang, Liu Shao-Wei, Yang Lei, Deng Wen-Qiang, Jia Zhe-Heng. A gait control method for biped robot on slope based on deep reinforcement learning. Acta Automatica Sinica, 2020, 46(x): 1−12
    [13] Szepesvári C. Algorithms for reinforcement learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 2010, 4(1): 1-103
    [14] Sigaud O, Buffet O. Markov Decision Processes in Artificial Intelligence. Hoboken: John Wiley & Sons, 2013. 39−44
    [15] 王子强, 武继刚. 基于RDC-Q学习算法的移动机器人路径规划. 计算机工程, 2014, 40(6): 211-214 doi: 10.3969/j.issn.1000-3428.2014.06.045

    Wang Zi-Qiang, Wu Ji-Gang. Mobile robot path planning based on RDC-Q learning algorithm. Computer Engineering, 2014, 40(6): 211-214 doi: 10.3969/j.issn.1000-3428.2014.06.045
    [16] Silva Junior, A G D, Santos D H D, Negreiros A P F D, Silva J M V B D S, Gonçalves L M G. High-level path planning for an autonomous sailboat robot using Q-learning. Sensors, 2020, 20(6), 1550 doi: 10.3390/s20061550
    [17] Kim B, Kaelbling L P, Lozano-Pérez T. Adversarial actor-critic method for task and motion planning problems using planning experience. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Honolulu, USA: 2019. 8017−8024
    [18] Chen Y F, Liu M, Everett M, How J P. Decentralized non-communicating multi-agent collision avoidance with deep reinforcement learning. In: Proceedings of the IEEE international conference on robotics and automation. Singapore: IEEE, 2017. 285−292
    [19] Tai L, Paolo G, Liu M. Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Vancouver, Canada: IEEE, 2017. 31−36
    [20] Zhang J, Springenberg J T, Boedecker J, Burgard W. Deep reinforcement learning with successor features for navigation across similar environments. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Vancou-ver, Canada: IEEE, 2017. 2371−2378
    [21] Matthew H, Stone P. Deep recurrent Q-learning for partially observable MDPs. arXiv preprint arXiv: 1507.06527, 2015.
    [22] Liu F, Chen C, Li Z, Guan Z, Wang H. Research on path planning of robot based on deep reinforcement learning. In: Proceedings of the 39th Chinese Control Conference. Shenyang, China: IEEE, 2020. 3730−3734
    [23] Wang P, Chan C Y. Formulation of deep reinforcement learning architecture toward autonomous driving for on-ramp merge. In: Proceedings of the IEEE 20th International Conference on Intelligent Transportation Systems. Yokohama, Japan: IEEE, 2017. 1−6
    [24] Deshpande N, Vaufreydaz D, Spalanzani A. Behavioral decision-making for urban autonomous driving in the presence of pedestrians using deep recurrent Q-network. In: Proceedings of the 16th International Conference on Control, Automation, Robotics and Vision. Shenzhen, China: IEEE, 2020. 428−433
    [25] Peixoto M J P, Azim A. Context-based learning for autonomous vehicles. In: Proceedings of the IEEE 23rd International Symposium on Real-time Distributed Computing. Nashville, USA: IEEE, 2020. 150−151
    [26] Degris T, Pilarski P M, Sutton R S. Model-free reinforcement learning with continuous action in practice. In: Proceedings of the American Control Conference. Montreal, Canada: IEEE, 2012. 2177−2182
    [27] Gao N, Qin Z, Jing X, Ni Q, Jin S. Anti-Intelligent UAV jamming strategy via deep Q-Networks. IEEE Transactions on Communications, 2019, 68(1): 569-581
    [28] Mnih V, Kavukcuoglu K, Silver D. Playing atari with deep reinforcement learning. arXiv preprint arXiv: 1312.5602, 2013.
    [29] Mnih V, Kavukcuoglu K, Silver D. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529-533 doi: 10.1038/nature14236
    [30] Zhang CL, Liu XJ, Wan DC, Wang JB. Experimental and numerical investigations of advancing speed effects on hydrodynamic derivatives in MMG model, part I: X-vv, Y-v, N-v. Ocean Engineering, 2019, 179: 67-75 doi: 10.1016/j.oceaneng.2019.03.019
  • 加载中
图(13) / 表(1)
计量
  • 文章访问数:  3528
  • HTML全文浏览量:  924
  • PDF下载量:  559
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-01-25
  • 网络出版日期:  2021-07-31
  • 刊出日期:  2023-08-21

目录

    /

    返回文章
    返回