2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于深度强化学习的有轨电车信号优先控制

王云鹏 郭戈

王云鹏, 郭戈. 基于深度强化学习的有轨电车信号优先控制. 自动化学报, 2019, 45(12): 2366−2377 doi: 10.16383/j.aas.c190164
引用本文: 王云鹏, 郭戈. 基于深度强化学习的有轨电车信号优先控制. 自动化学报, 2019, 45(12): 2366−2377 doi: 10.16383/j.aas.c190164
Wang Yun-Peng, Guo Ge. Signal priority control for trams using deep reinforcement learning. Acta Automatica Sinica, 2019, 45(12): 2366−2377 doi: 10.16383/j.aas.c190164
Citation: Wang Yun-Peng, Guo Ge. Signal priority control for trams using deep reinforcement learning. Acta Automatica Sinica, 2019, 45(12): 2366−2377 doi: 10.16383/j.aas.c190164

基于深度强化学习的有轨电车信号优先控制

doi: 10.16383/j.aas.c190164
基金项目: 国家自然科学基金(61573077, U1808205)资助
详细信息
    作者简介:

    王云鹏:大连理工大学控制理论与控制工程专业博士研究生. 主要研究方向为智能车路协同系统. E-mail: yunpengwang0306@163.com

    郭戈:东北大学教授. 1998年获得东北大学控制理论与控制工程专业博士学位. 主要研究方向为智能交通系统, 运动目标检测跟踪网络. 本文通信作者. E-mail: geguo@yeah.net

Signal Priority Control for Trams Using Deep Reinforcement Learning

Funds: Supported by National Natural Science Foundation of China (61573077, U1808205)
  • 摘要: 现有的有轨电车信号优先控制系统存在诸多问题, 如无法适应实时交通变化、优化求解较为复杂等. 本文提出了一种基于深度强化学习的有轨电车信号优先控制策略. 不依赖于交叉口复杂交通建模, 采用实时交通信息作为输入, 在有轨电车整个通行过程中连续动态调整交通信号. 协同考虑有轨电车与社会车辆的通行需求, 在尽量保证有轨电车无需停车的同时, 降低社会车辆的通行延误. 采用深度Q网络算法进行问题求解, 并利用竞争架构、双Q网络和加权样本池改善学习性能. 基于SUMO的实验表明, 该模型能够有效地协同提高有轨电车与社会车辆的通行效率.
  • 图  1  路口示意图

    Fig.  1  Intersection diagram

    图  2  深度神经网络结构图

    Fig.  2  The structure of DNN

    图  3  有轨电车平均停车次数对比

    Fig.  3  Comparison of tram mean stops

    图  4  平均累积奖励对比

    Fig.  4  Comparison of cumulative reward

    图  5  各直行/右转车道平均停车等待时间对比

    Fig.  5  Comparison of waiting time in direct/right turn lanes

    图  6  各左转车道平均停车等待时间对比

    Fig.  6  Comparison of waiting time in left turn lanes

    图  7  两种深度强化学习模型下有轨电车平均停车次数对比

    Fig.  7  Comparison of tram mean stops under two deep reinforcement learning models

    图  8  两种深度强化学习模型下累积奖励对比

    Fig.  8  Comparison of cumulative reward under two deep reinforcement learning models

    图  9  两种深度强化学习模型下各直行/右转车道平均停车等待时间对比

    Fig.  9  Comparison of waiting time in direct/right turn lanes under two deep reinforcement learning models

    图  10  两种深度强化学习模型下各左转车道平均停车等待时间对比

    Fig.  10  Comparison of waiting time in left turn lanes under two deep reinforcement learning models

    表  1  模型参数

    Table  1  Model parameters

    参数 取值
    $N$ 20 000
    $m$ 32
    $\Delta \varepsilon$ −0.001
    $\gamma$ 0.99
    $\alpha$ 0.001
    下载: 导出CSV
  • [1] Ministry of tranport of China. Statistical bulletin on transportation industry development in 2018. [Online], available: http://xxgk.mot.gov.cn/jigou/zhghs/201904/t20190412_3186720.html, September 5, 2019
    [2] 2 Shi J G, Sun Y S, Schonfeld P, Qi J. Joint optimization of tram timetables and signal timing adjustments at intersections. Transportation Research Part C: Emerging Technologies, 2017, 83(6): 104−119
    [3] 3 Ji Y X, Tang Y, Du Y C, Zhang X. Coordinated optimization of tram trajectories with arterial signal timing resynchronization. Transportation Research Part C: Emerging Technologies, 2019, 99(4): 53−66
    [4] Little J D C, Kelson M D, Gartner N M. Maxband: a program for setting signals on arteries and triangular networks. In: Proceedings of the 60th Annual Meeting of the Transportation Research Board. Washington, USA: Transportation Research Board, 1981. 40−46
    [5] 5 Jeong Y J, Kim Y C. Tram passive signal priority strategy based on the maxband model. KSCE Journal of Civil Engineering, 2014, 18(5): 1518−1527 doi: 10.1007/s12205-014-0159-1
    [6] 6 Ma W, Zou L, An K, Gartner N H, Wang M. A partition-enabled multi-mode band approach to arterial traffic signal optimization. IEEE Transactions on Intelligent Transportation Systems, 2019, 20(1): 313−322 doi: 10.1109/TITS.2018.2815520
    [7] 7 Kim H, Cheng Y, Chang G. Variable signal progression bands for transit vehicles under dwell time uncertainty and traffic queues. IEEE Transactions on Intelligent Transportation Systems, 2019, 20(1): 109−122 doi: 10.1109/TITS.2018.2801567
    [8] 8 Ji Y X, Tang Y, Wang W, Du Y C. Tram-oriented traffic signal timing resynchronization. Journal of Advanced Transportation, 2018, 2018(1): 1−13
    [9] 9 Jacobson J, Sheffi Y. Analytical model of traffic delays under bus signal preemption: theory and application. Transportation Research Part B: Methodological, 1981, 15(2): 127−138 doi: 10.1016/0191-2615(81)90039-4
    [10] 10 Yang M, Ding J, Wang W, Ma Y Y. A coordinated signal priority strategy for modern trams on arterial streets by predicting the tram dwell time. KSCE Journal of Civil Engineering, 2018, 22(2): 823−836 doi: 10.1007/s12205-017-1187-4
    [11] 高阳, 陈世福, 陆鑫. 强化学习研究综述. 自动化学报, 2004, 30(1): 1−15 doi: 10.3969/j.issn.1003-8930.2004.01.001

    11 Gao Yang, Chen Shi-Fu, Lu Xin. Reseacrh on reinforcement learning technology: a review. Acta Automatica Sinica, 2004, 30(1): 1−15 doi: 10.3969/j.issn.1003-8930.2004.01.001
    [12] 12 Bertsekas D P. Feature-based aggregation and deep reinforcement learning: a survey and some new implementations. IEEE/CAA Journal of Automatica Sinica, 2019, 6(1): 1−31
    [13] 13 Samah E T, Abdulhai B, Abdelgawad H. Design of reinforcement learning parameters for seamless application of adaptive traffic signal control. Journal of Intelligent Transportation Systems, 2014, 18(3): 227−245 doi: 10.1080/15472450.2013.810991
    [14] 段艳杰, 吕宜生, 张杰, 赵学亮, 王飞跃. 深度学习在控制领域的研究现状与展望. 自动化学报, 2016, 42(5): 643−654

    14 Duan Yan-Jie, Lv Yi-Sheng, Zhang Jie, Zhao Xue-Liang, Wang Fei-Yue. Deep learning for control: the state of the art and prospects. Acta Automatica Sinica, 2016, 42(5): 643−654
    [15] 15 Li L, Lv Y, Wang F-Y. Traffic signal timing via deep reinforcement learning. IEEE/CAA Journal of Automatica Sinica, 2016, 3(3): 247−254
    [16] 16 Liang X, Du X, Wang G, Han Z. A deep reinforcement learning network for traffic light cycle control. IEEE Transactions on Vehicular Technology, 2019, 68(2): 1243−1253 doi: 10.1109/TVT.2018.2890726
    [17] 17 Ling K, Shalaby A. Automated transit headway control via adaptive signal priority. Journal of Advanced Transportation, 2004, 38(4): 45−67
    [18] 舒波, 李大铭, 赵新良. 基于强化学习算法的公交信号优先策略. 东北大学学报(自然科学版), 2012, 33(10): 1513−1516 doi: 10.12068/j.issn.1005-3026.2012.10.035

    18 Shu Bo, Li Da-Ming, Zhao Xin-Liang. Transit signal priority strategy based on reinforcement learning algorithm. Journal of Northeastern University (Natural Science), 2012, 33(10): 1513−1516 doi: 10.12068/j.issn.1005-3026.2012.10.035
    [19] 梁星星, 冯旸赫, 马扬, 程光权, 黄金才, 王琦等. 多agent深度强化学习综述. 自动化学报, 2019. DOI: 10.16383/j.aas.c180372

    Liang Xing-Xing, Feng Yang-He, Ma Yang, Cheng Guang-Quan, Huang Jin-Cai, Wang Qi, et al. Deep multi-agent reinforcement learning: a survey. Acta Automatica Sinica, 2019. DOI: 10.16383/j.aas.c180372
    [20] 赵英男, 刘鹏, 赵巍, 唐降龙. 深度q学习的二次主动采样方法. 自动化学报, 2019, 45(10): 1870−1882 doi: 10.3969/j.issn.1003-8930.2019.01.001

    20 Zhao Ying-Nan, Liu Peng, Zhao Wei, Tang Xiang-Long. Twice sampling method in deep Q-network. Acta Automatica Sinica, 2019, 45(10): 1870−1882 doi: 10.3969/j.issn.1003-8930.2019.01.001
    [21] Wang Z Y, Schaul T, Hessel M, Hasselt H, Lanctot M, Freitas N. Dueling network architectures for deep reinforcement learning. In: Proceedings of the 33rd International Conference on Machine Learning. New York, USA: PMLR, 2016. 1995−2003
    [22] Hasselt H V, Guez A, Silver D. Deep reinforcement learning with double Q-learning. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence, Phoenix, USA: MIT, 2015. 2094−2100
    [23] Schaul T, Quan J, Antonoglou I, Silver D. Prioritized experience replay. In: Proceedings of the 2016 International Conference on Learning Representations 2016, San Juan, Puerto Rico: arXiv, 2016. 1−21
    [24] Lopez P A, Behrisch M, Walz L B, Erdmann J, Flotterod Y, Hilbrich R, et al. Microscopic traffic simulation using sumo. In: Proceedings of the 21st IEEE International Conference on Intelligent Transportation Systems. Hawaii, USA: IEEE, 2018. 2575−2582
    [25] 25 Islam M T, Tiwana J, Bhowmick A, Qiu T Z. Design of LRT signal priority to improve arterial traffic mobility. Journal of Transportation Engineering, 2016, 142(9): 04016034 doi: 10.1061/(ASCE)TE.1943-5436.0000831
  • 加载中
图(10) / 表(1)
计量
  • 文章访问数:  3061
  • HTML全文浏览量:  945
  • PDF下载量:  489
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-03-15
  • 录用日期:  2019-09-02
  • 刊出日期:  2019-12-01

目录

    /

    返回文章
    返回