2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于滚动时域强化学习的智能车辆侧向控制算法

张兴龙 陆阳 李文璋 徐昕

张兴龙, 陆阳, 李文璋, 徐昕. 基于滚动时域强化学习的智能车辆侧向控制算法. 自动化学报, 2023, 49(12): 2481−2492 doi: 10.16383/j.aas.c210555
引用本文: 张兴龙, 陆阳, 李文璋, 徐昕. 基于滚动时域强化学习的智能车辆侧向控制算法. 自动化学报, 2023, 49(12): 2481−2492 doi: 10.16383/j.aas.c210555
Zhang Xing-Long, Lu Yang, Li Wen-Zhang, Xu Xin. Receding horizon reinforcement learning algorithm for lateral control of intelligent vehicles. Acta Automatica Sinica, 2023, 49(12): 2481−2492 doi: 10.16383/j.aas.c210555
Citation: Zhang Xing-Long, Lu Yang, Li Wen-Zhang, Xu Xin. Receding horizon reinforcement learning algorithm for lateral control of intelligent vehicles. Acta Automatica Sinica, 2023, 49(12): 2481−2492 doi: 10.16383/j.aas.c210555

基于滚动时域强化学习的智能车辆侧向控制算法

doi: 10.16383/j.aas.c210555
基金项目: 国家重点研究发展计划 (2018YFB1305105), 国家自然科学基金(62003361, 61825305) 资助
详细信息
    作者简介:

    张兴龙:国防科技大学智能科学学院副研究员. 2018年获得意大利米兰理工大学博士学位. 主要研究方向为滚动时域强化学习及其在无人系统中的应用. E-mail: zhangxinglong18@nudt.edu.cn

    陆阳:国防科技大学智能科学学院博士研究生. 2020年获得国防科技大学硕士学位, 2018年获得山东大学学士学位. 主要研究方向为强化学习及其在无人系统中的应用. E-mail: luyang18@nudt.edu.cn

    李文璋:2018年获得北京理工大学学士学位, 2020年获得国防科技大学硕士学位. 主要研究方向为智能车学习控制. E-mail: 15624953231@163.com

    徐昕:国防科技大学智能科学学院研究员. 2002年获得国防科技大学机电与自动化学院控制科学与工程博士学位. 主要研究方向为智能控制, 强化学习, 近似动态规划, 机器学习, 机器人和智能驾驶. 本文通信作者. E-mail: xinxu@nudt.edu.cn

Receding Horizon Reinforcement Learning Algorithm for Lateral Control of Intelligent Vehicles

Funds: Supported by National Key Research and Development Program of China (2018YFB1305105) and National Natural Science Foundation of China (62003361, 61825305)
More Information
    Author Bio:

    ZHANG Xing-Long Associate professor at the College of Intelligence Science and Technology, National University of Defense Technology. He received his Ph.D. degree from Politecnico di Milano, Italy, in 2018. His research interest covers receding horizon reinforcement learning and its application in unmanned systems

    LU Yang Ph.D. candidate at the College of Intelligence Science and Technology, National University of Defense Technology. He received his master and bachelor degrees from National University of Defense Technology in 2020 and Shandong University in 2018, respectively. His research interest covers reinforcement learning and its application in unmanned systems

    LI Wen-Zhang Received his bachelor and master degrees from National University of Defense Technology in 2018 and Beijing Institute of Technology in 2020, respectively. His research interest covers learning control of intelligent vehicles

    XU Xin Professor at the College of Intelligence Science and Technology, National University of Defense Technology. He received his Ph.D. degree in control science and engineering from the College of Mechatronics and Automation, National University of Defense Technology in 2002. His research interest covers intelligent control, reinforcement learning, approximate dynamic programming, machine learning, robotics, and autonomous vehicles. Corresponding author of this paper

  • 摘要: 针对智能车辆的高精度侧向控制问题, 提出一种基于滚动时域强化学习(Receding horizon reinforcement learning, RHRL)的侧向控制方法. 车辆的侧向控制量由前馈和反馈两部分构成, 前馈控制量由参考路径的曲率以及动力学模型直接计算得出; 而反馈控制量通过采用滚动时域强化学习算法求解最优跟踪控制问题得到. 提出的方法结合滚动时域优化机制, 将无限时域最优控制问题转化为若干有限时域控制问题进行求解. 与已有的有限时域执行器−评价器学习不同, 在每个预测时域采用时间独立型执行器−评价器网络结构学习最优值函数和控制策略. 与模型预测控制(Model predictive control, MPC)方法求解开环控制序列不同, RHRL控制器的输出是一个显式状态反馈控制律, 兼具直接离线部署和在线学习部署的能力. 此外, 从理论上证明了RHRL算法在每个预测时域的收敛性, 并分析了闭环系统的稳定性. 在仿真环境中完成了结构化道路下的车辆侧向控制测试. 仿真结果表明, 提出的RHRL方法在控制性能方面优于现有先进算法, 最后, 以红旗E-HS3电动汽车作为实车平台, 在封闭结构化城市测试道路和乡村起伏砂石道路下进行了侧向控制实验. 实验结果显示, RHRL在结构化城市道路中的侧向控制性能优于预瞄控制, 在乡村道路中具有较强的路面适应能力和较好的控制性能.
  • 图  1  智能车辆二自由度侧向模型

    Fig.  1  Two-degree-of-freedom lateral model of intelligent vehicle

    图  2  侧向误差模型

    Fig.  2  Lateral error model

    图  3  智能车侧向控制框图

    Fig.  3  Lateral control diagram of intelligent vehicle

    图  4  参考路径

    Fig.  4  Reference path

    图  5  30 ${\rm{km/h}}$下智能车跟踪控制侧向偏差对比

    Fig.  5  Comparison of lateral tracking error of intelligent vehicles under $v_x = 30 \; {\rm{km/h}}$

    图  6  50 ${\rm{km/h}}$下智能车跟踪控制侧向偏差对比

    Fig.  6  Comparison of lateral tracking error of intelligent vehicles under $v_x = 50 \; {\rm{km/h }}$

    图  7  红旗E-HS3智能驾驶平台

    Fig.  7  Hongqi E-HS3 intelligent driving platform

    图  8  基于RHRL和纯点预瞄方法的红旗E-HS3行驶路径

    Fig.  8  Path of Hongqi E-HS3 vehicle controlled by RHRL and pure pursuit methods

    图  9  RHRL与纯点预瞄方法的车辆实测侧向偏差对比

    Fig.  9  Comparison of experimental lateral tracking error of the RHRL and pure pursuit methods

    图  10  乡村砂石道路地图和车辆行驶中各阶段状态

    Fig.  10  The route map in the country sand and gravel road, and the status of different stages in the control process

    图  11  侧向误差曲线

    Fig.  11  Curves of the lateral error

    表  1  车辆动力学参数

    Table  1  The parameters of the vehicle dynamics

    符号 物理意义 数值 单位
    $m$ 车身质量 1723 kg
    $I_z$ 转动惯量 4175 kg·m2
    $l_f$ 质心到前轴距离 1.232 m
    $l_r$ 质心到后轴距离 1.468 m
    $C_f$ 前轮侧偏刚度 66900 N/rad
    $C_r$ 后轮侧偏刚度 62700 N/rad
    下载: 导出CSV

    表  2  各控制器的均方根误差对比

    Table  2  The RMSE comparison among all the controllers

    方法 vx = 30 km/h vx = 50 km/h
    ey (m) $e_\varphi$(rad) ey (m) $e_\varphi $(rad)
    RHRL 0.156 0.030 0.246 0.020
    HDP 0.165 0.030 0.315 0.019
    SAC 0.189 0.029 0.283 0.017
    DDPG 0.172 0.037 0.319 0.017
    MPC 0.212 0.025 0.278 0.015
    纯点预瞄 0.159 0.036 0.286 0.030
    下载: 导出CSV
  • [1] 熊璐, 杨兴, 卓桂荣, 冷搏, 章仁夑. 无人驾驶车辆的运动控制发展现状综述. 机械工程学报, 2020, 56(10): 127-143 doi: 10.3901/JME.2020.10.127

    Xiong Lu, Yang Xing, Zhuo Gui-Rong, Leng Bo, Zhang Ren-Xie. Review on motion control of autonomous vehicles. Journal of Mechanical Engineering, 2020, 56(10): 127-143 doi: 10.3901/JME.2020.10.127
    [2] 陈虹, 郭露露, 宫洵, 高炳钊, 张琳. 智能时代的汽车控制. 自动化学报, 2020, 46(7): 1313-1332 doi: 10.16383/j.aas.c190329

    Chen Hong, Guo Lu-Lu, Gong Xun, Gao Bing-Zhao, Zhang Lin. Automative control in intelligent era. Acta Automatica Sinica, 2020, 46(7): 1313-1332 doi: 10.16383/j.aas.c190329
    [3] 田涛涛, 侯忠生, 刘世达, 邓志东. 基于无模型自适应控制的无人驾驶汽车横向控制方法. 自动化学报, 2017, 43(11): 1931-1940 doi: 10.16383/j.aas.2017.c160633

    Tian Tao-Tao, Hou Zhong-Sheng, Liu Shi-Da, Deng Zhi-Dong. Model-free adaptive control based lateral control of self-driving car. Acta Automatica Sinica, 2017, 43(11): 1931-1940 doi: 10.16383/j.aas.2017.c160633
    [4] Ahmed A A, Alshandoli A F S. Using of neural network controller and fuzzy PID control to improve electric vehicle stability based on A14-DOF model. In: Proceedings of the International Conference on Electrical Engineering (ICEE). Istanbul, Turkey: IEEE, 2020. 1−6
    [5] Meng J, Liu A B, Yang Y Q, Wu Z, Xu Q Y. Two-wheeled robot platform based on PID control. In: Proceedings of the 5th International Conference on Information Science and Control Engineering (ICISCE). Zhengzhou, China: IEEE, 2018. 1011−1014
    [6] Farag W. Complex trajectory tracking using PID control for autonomous driving. International Journal of Intelligent Transportation Systems Research, 2020, 18(2): 356-366 doi: 10.1007/s13177-019-00204-2
    [7] Zhao P, Chen J J, Song Y, Tao X, Xu T J, Mei T. Design of a control system for an autonomous vehicle based on adaptive-PID. International Journal of Advanced Robotic Systems, 2012, 9(2): Article No. 44 doi: 10.5772/51314
    [8] Han G N, Fu W P, Wang W, Wu Z S. The lateral tracking control for the intelligent vehicle based on adaptive PID neural network. Sensors, 2017, 17(6): Article No. 1244 doi: 10.3390/s17061244
    [9] Fraichard T, Garnier P. Fuzzy control to drive car-like vehicles. Robotics and Autonomous Systems, 2001, 34(1): 1-22 doi: 10.1016/S0921-8890(00)00096-8
    [10] Pérez J, Milanés V, Onieva E. Cascade architecture for lateral control in autonomous vehicles. IEEE Transactions on Intelligent Transportation Systems, 2011, 12(1): 73-82 doi: 10.1109/TITS.2010.2060722
    [11] Li H M, Wang X B, Song S B, Li H. Vehicle control strategies analysis based on PID and fuzzy logic control. Procedia Engineering, 2016, 137: 234-243 doi: 10.1016/j.proeng.2016.01.255
    [12] Park M W, Lee S W, Han W Y. Development of lateral control system for autonomous vehicle based on adaptive pure pursuit algorithm. In: Proceedings of the 14th International Conference on Control, Automation and Systems (ICCAS). Gyeonggi-do, South Korea: IEEE, 2014. 1443−1447
    [13] 郭景华, 胡平, 李琳辉, 王荣本, 张明恒, 郭烈. 基于遗传优化的无人车横向模糊控制. 机械工程学报, 2012, 48(6): 76-82 doi: 10.3901/JME.2012.06.076

    Guo Jing-Hua, Hu Ping, Li Lin-Hui, Wang Rong-Ben, Zhang Ming-Heng, Guo Lie. Study on lateral fuzzy control of unmanned vehicles via genetic algorithms. Chinese Journal of Mechanical Engineering, 2012, 48(6): 76-82 doi: 10.3901/JME.2012.06.076
    [14] Leonard J, How J, Teller S, Berger M, Campbell S, Fiore G, et al. A perception-driven autonomous urban vehicle. Journal of Field Robotics, 2008, 25(10): 727-774 doi: 10.1002/rob.20262
    [15] Rajamani R, Zhu C, Alexander L. Lateral control of a backward driven front-steering vehicle. Control Engineering Practice, 2003, 11(5): 531-540 doi: 10.1016/S0967-0661(02)00143-0
    [16] Thrun S, Montemerlo M, Dahlkamp H, Stavens D, Aron A, Diebel J, et al. Stanley: The robot that won the DARPA grand challenge. Journal of Field Robotics, 2006, 23(9): 661-692 doi: 10.1002/rob.20147
    [17] 龚建伟, 姜岩, 徐威. 无人驾驶车辆模型预测控制. 北京: 北京理工大学出版社, 2014.

    Gong Jian-Wei, Jiang Yan, Xu Wei. Model Predictive Control for Self-Driving Vehicles. Beijing: Beijing Institute of Technology Press, 2014.
    [18] Falcone P, Borrelli F, Asgari J, Tseng H E, Hrovat D. Predictive active steering control for autonomous vehicle systems. IEEE Transactions on Control Systems Technology, 2007, 15(3): 566-580 doi: 10.1109/TCST.2007.894653
    [19] Carvalho A, Gao Y Q, Gray A, Tseng H E, Borrelli F. Predictive control of an autonomous ground vehicle using an iterative linearization approach. In: Proceedings of the 16th International IEEE Conference on Intelligent Transportation Systems (ITSC). The Hague, The Netherlands: IEEE, 2013. 2335−2340
    [20] Beal C E, Gerdes J C. Model predictive control for vehicle stabilization at the limits of handling. IEEE Transactions on Control Systems Technology, 2013, 21(4): 1258-1269 doi: 10.1109/TCST.2012.2200826
    [21] Liniger A, Domahidi A, Morari M. Optimization-based autonomous racing of 1: 43 scale RC cars. Optimal Control Applications and Methods, 2015, 36(5): 628-647 doi: 10.1002/oca.2123
    [22] Kabzan J, Hewing L, Liniger A, Zeilinger M N. Learning-based model predictive control for autonomous racing. IEEE Robotics and Automation Letters, 2019, 4(4): 3363-3370 doi: 10.1109/LRA.2019.2926677
    [23] Ostafew C J, Schoellig A P, Barfoot T D. Robust constrained learning-based NMPC enabling reliable mobile robot path tracking. The International Journal of Robotics Research, 2016, 35(13): 1547-1563 doi: 10.1177/0278364916645661
    [24] 由智恒. 基于MPC算法的无人驾驶车辆轨迹跟踪控制研究[硕士学位论文], 吉林大学, 中国, 2018.

    You Zhi-Heng. Research on Model Predictive Control-based Trajectory Tracking for Unmanned Vehicles [Master thesis], Jilin University, China, 2018.
    [25] 王鼎. 基于学习的鲁棒自适应评判控制研究进展. 自动化学报, 2019, 45(6): 1031-1043 doi: 10.16383/j.aas.c170701

    Wang Ding. Research progress on learning-based robust adaptive critic control. Acta Automatica Sinica, 2019, 45(6): 1031-1043 doi: 10.16383/j.aas.c170701
    [26] Wang D, Ha M M, Qiao J F. Data-driven iterative adaptive critic control toward an urban wastewater treatment plant. IEEE Transactions on Industrial Electronics, 2021, 68(8): 7362-7369 doi: 10.1109/TIE.2020.3001840
    [27] Oh S Y, Lee J H, Choi D H. A new reinforcement learning vehicle control architecture for vision-based road following. IEEE Transactions on Vehicular Technology, 2000, 49(3): 997-1005 doi: 10.1109/25.845116
    [28] 杨慧媛. 基于增强学习的优化控制方法及其在移动机器人中的应用[硕士学位论文], 国防科学技术大学, 中国, 2014.

    Yang Hui-Yuan. Reinforcement Learning-Based Optimal Control Methods with Applications to Mobile Robots [Master thesis], National University of Defense Technology, China, 2014.
    [29] 连传强. 基于近似动态规划的优化控制方法及在自主驾驶车辆中的应用[博士学位论文], 国防科学技术大学, 中国, 2016.

    Lian Chuan-Qiang. Optimization Control Methods Based on Approximate Dynamic Programming and Its Applications in Autonomous Land Vehicles [Ph.D. dissertation], National University of Defense Technology, China, 2016.
    [30] 黄振华. 智能驾驶车辆自评价学习控制方法研究[博士学位论文], 国防科学技术大学, 中国, 2017.

    Huang Zhen-Hua. Researches on Adaptive Critic Learning Control Approaches for Intelligent Driving Vehicles [Ph.D. dissertation], National University of Defense Technology, China, 2017.
    [31] Lian C Q, Xu X, Chen H, He H B. Near-optimal tracking control of mobile robots via receding-horizon dual heuristic programming. IEEE Transactions on Cybernetics, 2016, 46(11): 2484-2496 doi: 10.1109/TCYB.2015.2478857
    [32] Kuutti S, Bowden R, Jin Y C, Barber P, Fallah S. A survey of deep learning applications to autonomous vehicle control. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(2): 712-733 doi: 10.1109/TITS.2019.2962338
    [33] Li D, Zhao D B, Zhang Q C, Chen Y R. Reinforcement learning and deep learning based lateral control for autonomous driving[Application Notes]. IEEE Computational Intelligence Magazine, 2019, 14(2): 83-98 doi: 10.1109/MCI.2019.2901089
    [34] Chen Y X, Hereid A, Peng H E, Grizzle J. Enhancing the performance of a safe controller via supervised learning for truck lateral control. Journal of Dynamic Systems, Measurement, and Control, 2019, 141(10): Article No. 101005 doi: 10.1115/1.4043487
    [35] Dong L, Yan J, Yuan X, He H B, Sun C Y. Functional nonlinear model predictive control based on adaptive dynamic programming. IEEE Transactions on Cybernetics, 2019, 49(12): 4206-4218 doi: 10.1109/TCYB.2018.2859801
    [36] Rajamani R. Vehicle Dynamics and Control (Second edition). New York: Springer, 2012.
    [37] Xu X, Chen H, Lian C Q, Li D Z. Learning-based predictive control for discrete-time nonlinear systems with stochastic disturbances. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(12): 6202-6213 doi: 10.1109/TNNLS.2018.2820019
    [38] Chmielewski D, Manousiouthakis V. On constrained infinite-time linear quadratic optimal control. Systems & Control Letters, 1996, 29(3): 121-129
    [39] Rawlings J B, Mayne D Q, Diehl M M. Model Predictive Control: Theory, Computation, and Design (Second edition). Madison: Nob Hill Publishing, 2017.
    [40] Mayne D Q, Kerrigan E C, van Wyk E J, Falugi P. Tube-based robust nonlinear model predictive control. International Journal of Robust and Nonlinear Control, 2011, 21(11): 1341-1353 doi: 10.1002/rnc.1758
    [41] Zhang X L, Pan W, Scattolini R, Yu S Y, Xu X. Robust tube-based model predictive control with Koopman operators. Automatica, 2022, 137: Article No. 110114 doi: 10.1016/j.automatica.2021.110114
    [42] Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th International Conference on Machine Learning (ICML). Stockholm, Sweden: PMLR, 2018. 1856−1865
    [43] Lillicrap T P, Hunt J J, Pritzel A, Heess N, Erez T, Tassa Y, et al. Continuous control with deep reinforcement learning. In: Proceedings of the 4th International Conference on Learning Representations (ICLR). San Juan, Puerto Rico: 2016.
    [44] Snider J M. Automatic Steering Methods for Autonomous Automobile Path Tracking, Technical Report CMU-RI-TR-09-08, Robotics Institute, Carnegie Mellon University, USA, 2009.
  • 加载中
图(11) / 表(2)
计量
  • 文章访问数:  2213
  • HTML全文浏览量:  1254
  • PDF下载量:  488
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-06-20
  • 录用日期:  2021-11-02
  • 网络出版日期:  2022-03-07
  • 刊出日期:  2023-12-27

目录

    /

    返回文章
    返回