2.656

2021影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

深度强化学习联合回归目标定位

姚红革 张玮 杨浩琪 喻钧

姚红革, 张玮, 杨浩琪, 喻钧. 深度强化学习联合回归目标定位. 自动化学报, 2023, 49(2): 1−10 doi: 10.16383/j.aas.c200045
引用本文: 姚红革, 张玮, 杨浩琪, 喻钧. 深度强化学习联合回归目标定位. 自动化学报, 2023, 49(2): 1−10 doi: 10.16383/j.aas.c200045
Yao Hong-Ge, Zhang Wei, Yang Hao-Qi, Yu Jun. Joint regression object localization based on deep reinforcement learning. Acta Automatica Sinica, 2023, 49(2): 1−10 doi: 10.16383/j.aas.c200045
Citation: Yao Hong-Ge, Zhang Wei, Yang Hao-Qi, Yu Jun. Joint regression object localization based on deep reinforcement learning. Acta Automatica Sinica, 2023, 49(2): 1−10 doi: 10.16383/j.aas.c200045

深度强化学习联合回归目标定位

doi: 10.16383/j.aas.c200045
详细信息
    作者简介:

    姚红革:西安工业大学计算机科学与工程学院副教授. 主要研究方向为机器学习和计算机视觉. E-mail: yaohongge@xatu.edu.cn

    张玮:西安工业大学计算机科学与工程学院硕士研究生. 主要研究方向为机器学习和计算机视觉. 本文通信作者. E-mail: weivanity@gmail.com

    杨浩琪:西安工业大学计算机科学与工程学院硕士研究生. 主要研究方向为目标检测, 胶囊网络和模型量化. E-mail: curioyhq@gmail.com

    喻钧:西安工业大学计算机学院教授. 主要研究方向为图像处理和模式识别. E-mail: yujun@xatu.edu.cn

Joint Regression Object Localization Based on Deep Reinforcement Learning

More Information
    Author Bio:

    YAO Hong-Ge Associate professor at the School of Computer Scien-ce and Engineering, Xi'an Univer-sity of Technology. His research interest covers machine learning and computer vision

    ZHANG Wei Master student at the School of Computer Science and Engineering, Xi'an University of Te-chnology. His research interest covers machine learning and computer vision. Corresponding author of this paper

    YANG Hao-Qi Master student at the School of Computer Science and Engineering, Xi'an University of Te-chnology. His research interest covers object detection, capsule network and model quantification

    YU Jun Professor at the School of Computer Science and Engineering, Xi'an University of Technology. He-r research interest covers image processing and pattern recognition

  • 摘要: 为了模拟人眼的视觉注意机制, 快速、高效地搜索和定位图像目标, 提出了一种基于循环神经网络的联合回归深度强化学习目标定位模型. 该模型将历史观测信息与当前时刻的观测信息融合并做出综合分析, 以训练智能体快速定位目标, 并联合回归器对智能体所定位的目标包围框进行精细调整. 实验结果表明, 该模型能够在少数时间步内快速、准确地定位目标.
  • 图  1  状态信息融合表示

    Fig.  1  Fusion representation of state information

    图  2  动作示意图

    Fig.  2  Schematic diagram of action

    图  3  模型整体结构图

    Fig.  3  Overall structure of the model

    图  4  融合网络 $ {f}_{c}\left({\theta }_{c}\right) $

    Fig.  4  Integration network $ {f}_{c}\left({\theta }_{c}\right) $

    图  5  动作网络 $ {f}_{a}\left({\theta }_{a}\right) $

    Fig.  5  Action network $ {f}_{a}\left({\theta }_{a}\right) $

    图  6  位置网络 $ {f}_{l}\left({\theta }_{l}\right) $

    Fig.  6  Location network $ {f}_{l}\left({\theta }_{l}\right) $

    图  7  回归网络 $ {f}_{g}\left({\theta }_{g}\right) $

    Fig.  7  Regression network $ {f}_{g}\left({\theta }_{g}\right) $

    图  8  动作网络训练图

    Fig.  8  Action network training chart

    图  10  位置网络训练图

    Fig.  10  Location network training chart

    图  9  回归网络训练图

    Fig.  9  Regression network training chart

    图  11  模型训练损失曲线图

    Fig.  11  Model training loss diagram

    图  12  测试结果示例1

    Fig.  12  Test result example 1

    图  15  测试结果示例4

    Fig.  15  Test result example 4

    图  16  测试结果示例5

    Fig.  16  Test result example 5

    图  17  测试结果示例6

    Fig.  17  Test result example 6

    图  13  测试结果示例2

    Fig.  13  Test result example 2

    图  14  测试结果示例3

    Fig.  14  Test result example 3

    图  18  测试结果示例IoU变化趋势示意图

    Fig.  18  Schematic diagram of variation trend of IoUtest result

    图  19  回归器精调后IoU交叠区域示意图

    Fig.  19  Schematic diagram of IoU overlapping area after fine adjustment of regressor

    表  1  不同算法在VOC 2007测试集上的定位精度表现(节选部分种类)

    Table  1  Positioning accuracy performance of different algorithms on VOC 2007 test set(category of excerpts)

    种类
    算法
    AeroBikeBirdBoatBottlebusCarCatmAP
    Faster R-CNN86.581.677.258.051.078.676.693.275.3
    Caicedo57.956.738.433.017.551.152.753.045.0
    Bueno56.152.042.238.422.146.742.252.644.0
    UR-DRQN59.458.744.636.128.355.348.452.447.9
    下载: 导出CSV

    表  2  不同算法平均每个epoch的定位耗时(s/epoch)

    Table  2  The average location time of each epoch in different algorithms (s/epoch)

    算法Faster R-CNNCaicedoBuenoUR-DRQN
    定位耗时372271251219
    下载: 导出CSV
  • [1] 王亚珅, 黄河燕, 冯冲, 周强. 基于注意力机制的概念化句嵌入研究. 自动化学报, 2020, 46(7): 1390-1400

    WANG Ya-Shen, HUANG He-Yan, FENG Chong, ZHOU Qiang. Conceptual Sentence Embeddings Based on Attention Mechanism. Acta Automatica Sinica, 2020, 46(7): 1390-1400.
    [2] Sherstinsky A. Fundamentals of recurrent neural network (rnn) and long short-term memory (lstm) network. Physica D: Nonlinear Phenomena, 2020, 404: 132306.. doi: 10.1016/j.physd.2019.132306
    [3] 孙长银, 穆朝絮. 多智能体深度强化学习的若干关键科学问题. 自动化学报, 2020, 46(7): 1301-1312

    Sun Chang-Yin, Mu Chao-Xu. Important scientific problems of multi-agent deep reinforcement learning. Acta Automatica Sinica, 2020, 46(7): 1301-1312.
    [4] Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-learning. In: Proceedings of the 13th AAAI Conference on Artificial Intelligence. Arizona, USA: AAAI, 2016. 2094− 2100
    [5] Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, et al. Playing atari with deep reinforcement learning. arXiv preprint, 2013, arXiv: 1312.5602
    [6] Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529. doi: 10.1038/nature14236
    [7] Rahman M A, Wang Y. Optimizing intersection-over-union in deep neural networks for image segmentation. In: Proceedings of the International Symposium on Visual Computing. Cham, Switzerland: 2016. 234−244
    [8] Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE, 2014. 580−587
    [9] Girshick R. Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015. 1440−1448
    [10] Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. In: Proceedings of the Advances in Neural Information Processing Systems. Vancouver, Canada: MIT Press. 2015. 91−99
    [11] Mnih V, Heess N, Graves A. Recurrent models of visual attention. In: Proceedings of the Advances in Neural Information Processing Systems. Vancouver, Canada: 2014. 2204−2212
    [12] Caicedo J C, Lazebnik S. Active object localization with deep reinforcement learning. In: Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015. 2488−2496
    [13] Bueno M B, Giró-i-Nieto X, Marqués F, et al. Hierarchical object detection with deep reinforcement learning. Deep Learning for Image Processing Applications, 2017, 31(164): 3.
    [14] Hara K, Liu M Y, Tuzel O, Farahmand A. M. Attentional network for visual object detection. arXiv preprint, 2017, arXiv: 1702.01478
    [15] Shah S M, Borkar V S. Q-learning for Markov decision processes with a satisfiability criterion. Systems & Control Letters, 2018. 113: 45-51.
    [16] Garcia F, Thomas P S. A meta-mdp approach to exploration for lifelong reinforcement learning. In: Proceedings of the Advances in Neural Information Processing Systems. Vancouver, Canada: MIT Press, 2019. 5691−5700
    [17] Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Canada: MIT Press, 2018
    [18] March J G. Exploration and exploitation in organizational learning. Organization Science, 1991, 2(1): 71-87. doi: 10.1287/orsc.2.1.71
    [19] Bertsekas D P. Dynamic Programming And Optimal Control. Belmont: Athena Scientific, 1995
  • 加载中
计量
  • 文章访问数:  946
  • HTML全文浏览量:  492
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-01-20
  • 录用日期:  2020-09-07
  • 网络出版日期:  2023-01-07

目录

    /

    返回文章
    返回