2.793

2018影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于关系图网络的目标检测

陈圣嘉 李志欣 唐振军 马慧芳

陈圣嘉, 李志欣, 唐振军, 马慧芳. 基于关系图网络的目标检测. 自动化学报, 2020, 45(x): 1−16 doi: 10.16383/j.aas.c200517
引用本文: 陈圣嘉, 李志欣, 唐振军, 马慧芳. 基于关系图网络的目标检测. 自动化学报, 2020, 45(x): 1−16 doi: 10.16383/j.aas.c200517
Cheng Sheng-Jia, Li Zhi-Xin, Tang Zhen-Jin, Ma Hui-Fang. Object detection based on relation graph network. Acta Automatica Sinica, 2020, 45(x): 1−16 doi: 10.16383/j.aas.c200517
Citation: Cheng Sheng-Jia, Li Zhi-Xin, Tang Zhen-Jin, Ma Hui-Fang. Object detection based on relation graph network. Acta Automatica Sinica, 2020, 45(x): 1−16 doi: 10.16383/j.aas.c200517

基于关系图网络的目标检测

doi: 10.16383/j.aas.c200517
基金项目: 国家自然科学基金(61966004, 61663004, 61866004, 61962008, 61762078), 广西自然科学基金(2019GXNSFDA245018)资助
详细信息
    作者简介:

    陈圣嘉:广西师范大学计算机科学与信息工程学院硕士生. 主要研究方向为图像理解与机器学习. E-mail: csj_gxnu@126.com

    李志欣:广西师范大学计算机科学与信息工程学院教授. 主要研究方向为图像理解、机器学习与跨媒体计算. 本文通信作者. E-mail: lizx@gxnu.edu.cn

    唐振军:广西师范大学计算机科学与信息工程学院教授. 主要研究方向为数字图像处理与多媒体信息安全. E-mail: zjtang@gxnu.edu.cn

    马慧芳:西北师范大学计算机科学与工程学院教授. 主要研究方向为数据挖掘与机器学习. E-mail: mahuifang@nwnu.edu.cn

Object Detection Based on Relation Graph Network

Funds: Supported by National Natural Science Foundation of China (61966004, 61663004, 61866004, 61962008, 61762078), Guangxi Natural Science Foundation (2019GXNSFDA245018)
  • 摘要: 传统检测器只关注目标区域附近的信息, 忽略了目标的关系信息, 使检测器难以识别小目标, 导致性能受限. 为了捕获和探索这些重要的关系, 提出基于图卷积神经网络的目标检测方法, 也即通过两个独立的关系图网络分别获取标签中目标的全局语义信息和图像中目标的局部空间信息. 语义关系网络可获取隐含的全局知识, 通过在数据集上构建有向图, 每个节点由标签的词嵌入表示, 输入图卷积神经网络得到语义关系特征. 空间关系网络对空间相对位置关系进行编码, 通过来自目标间的局部空间信息来丰富目标特征.通过PASCAL VOC和MS COCO数据集上的实验结果表明, 关键关系信息可显著提升目标检测的性能, 提高检测器对小目标的检测能力且同时输出更合理的边界框. 对比于基线模型, 本文方法对小目标的检测性能在平均精度和平均召回率两个指标上相对提高31.8%和32.3%.
  • 图  1  一个展示语义关系图和空间关系图的例子

    Fig.  1  An example demonstrating semantic and spatial relation graph

    图  2  Relation R-CNN目标检测整体框架

    Fig.  2  Overview of Relation R-CNN object detection framework

    图  3  语义关系网络的详细结构图

    Fig.  3  Structure of semantic relation network

    图  4  空间关系网络的详细结构图

    Fig.  4  Structure of spatial relation network

    图  5  不同ηγ值的性能比较

    Fig.  5  Performance comparison with different values of η and γ

    图  6  不同比例语义关系系数τ的性能比较

    Fig.  6  Performance comparison with different values of τ

    图  7  Faster R-CNN(顶部)和语义关系网络(底部)在PASCAL VOC数据集上的实验结果

    Fig.  7  Results of Faster R-CNN (top) and semantic relation network (bottom) on PASCAL VOC dataset

    图  8  在PASCAL VOC数据集上挖掘的全局语义关系的可视化

    Fig.  8  Visualization of global semantic relation mined on PASCAL VOC dataset

    图  9  Faster R-CNN(顶部)和空间关系网络(底部)在PASCAL VOC数据集上的实验结果

    Fig.  9  Results of Faster R-CNN (top) and spatial relation network (bottom) on PASCAL VOC dataset

    表  1  在PASCAL VOC 2007数据集上的目标检测实验结果

    Table  1  Object detection results on PASCAL VOC 2007 dataset

    方法 主干网络 训练集 输入分辨率 mAP (%)
    通用检
    测器
    Fast R-CNN[3] VGG16 07+12 600×1000 70.0
    Faster R-CNN[4] VGG16 07+12 600×1000 73.2
    SSD[6] VGG16 07+12 321×321 75.1
    NOC[31] VGG16 07+12 600×1000 73.3
    RON384[32] VGG16 07+12 384×384 75.4
    关系检
    测器
    ION[5] VGG16 07+12 600×1000 75.6
    SMN[10] VGG16 07 600×1000 70.0
    SIN[11] VGG16 07+12 600×1000 76.0
    RGC[23] VGG16 07+12 600×1000 76.1
    KG-CNet[24] VGG16 07 600×1000 66.6
    ACCNN[34] VGG16 07+12 600×1000 72.0
    Relation
    R-CNN
    VGG16 07+12 600×1000 76.6
    下载: 导出CSV

    表  2  在PASCAL VOC 2007数据集上使用不同主干网络的目标检测实验结果

    Table  2  Object detection results on PASCAL VOC 2007 dataset using different backbones

    方法 主干网络 训练集 输入分辨率 mAP (%)
    通用检
    测器
    Faster R-CNN[4] ResNet101 07+12 600×1000 76.4
    SSD321[6] ResNet101 07+12 600×1000 77.1
    YOLOv3[7] DarkNet 07+12 321×321 78.6
    CenterNet[18] ResNet101 07+12 384×384 78.7
    DSOD300[33] DenseNet 07+12 600×1000 77.7
    关系检
    测器
    GBDNet[22] Inception
    ResNet
    07+12 600×1000 77.2
    HKRM[25] ResNet101 07+12 600×1000 78.8
    Yes-Net416[35] ResNet101 07+12 416×416 79.2
    Relation
    R-CNN
    ResNet101 07+12 600×1000 78.9
    下载: 导出CSV

    表  3  在MS COCO数据集上的平均精度性能比较

    Table  3  Performance comparison of average precision on MS COCO dataset

    方法 主干网络 AP AP50 AP75 APS APM APL
    通用检测器 Faster R-CNN*[4] ResNet101 32.6 53.4 35.9 13.3 37.8 49.7
    Faster R-CNN[4] ResNet101 34.7 54.7 37.2 14.8 39.4 51.8
    YOLOv3[7] ResNet101 33.0 57.9 34.4 18.3 35.4 41.9
    TripeNet512[17] ResNet50 35.9 57.8 38.0 17.7 37.2 50.7
    RefineDet512[19] ResNet101 36.4 57.5 39.2 16.6 39.9 51.4
    RetinaNet500[20] ResNet101 34.4 53.1 36.8 14.7 38.5 49.1
    CornerNet511[21] Hourglass-52 37.8 53.7 40.1 17.0 39.0 50.5
    FPN[36] ResNet101 37.2 57.9 40.6 19.1 40.8 48.6
    DSSD513[37] ResNet101 33.2 53.3 35.2 13.0 35.4 51.1
    DFPR512[38] ResNet101 34.6 54.3 37.3 14.7 38.1 51.9
    关系检测器 ION[5] VGG16 23.0 42.0 23.0 6.0 23.8 37.3
    SMN[10] ResNet101 31.6 52.2 33.2 14.4 35.7 45.8
    SIN[11] VGG16 23.2 44.5 22.0 7.3 24.5 36.3
    RelationNet[13] ResNet101 35.4 56.1 38.5
    RelationNet-FPN[13] ResNet101 38.9 60.5 43.3
    GBDNet[22] Inception ResNet 27.0 45.8
    KG-CNet[24] VGG16 24.4
    HKRM[25] ResNet101 37.8
    RepGN-FPN[39] ResNet101 39.4
    Relation R-CNN* ResNet101 35.1 55.2 38.4 18.6 40.5 48.8
    Relation R-CNN ResNet101 36.2 56.9 39.3 19.5 41.2 49.1
    Relation R-CNN-FPN-S ResNet101 38.2 58.9 41.3 21.2 41.3 49.6
    Relation R-CNN-FPN ResNet101 38.5 59.2 42.0 21.6 42.2 50.1
    下载: 导出CSV

    表  4  在MS COCO数据集上的平均召回率性能比较

    Table  4  Performance comparison of average recall on MS COCO dataset

    方法 主干网络 AR10 AR100 ARS ARM ARL
    Faster R-CNN[4] ResNet101 45.4 46.6 23.5 53.1 66.1
    Relation R-CNN ResNet101 47.3 48.7 31.1 54.5 64.3
    下载: 导出CSV

    表  5  在PASCAL VOC数据集上的消融实验结果

    Table  5  Ablation study on PASCAL VOC dataset

    方法 Faster R-CNN Relation R-CNN
    Spatial
    Semantic (F)
    Semantic (A)
    mAP (%) 73.2 74.6 75.0 76.3 76.6
    aeroplane 76.5 78.4 78.6 78.8 78.4
    bicycle 79.0 80.7 79.8 79.5 79.7
    bird 70.9 77.6 76.6 75.2 74.9
    boat 65.5 65.8 69.7 68.1 70.6
    bottle 52.1 57.3 58.4 63.4 62.6
    bus 83.1 81.3 82.9 85.2 86.9
    car 84.7 85.5 86.7 87.7 87.7
    cat 86.4 88.1 87.1 88.7 87.5
    chair 52.0 53.5 54.5 58.1 58.8
    cow 81.9 81.4 82.5 80.2 83.2
    table 65.7 64.1 66.7 71.7 71.9
    dog 84.8 85.2 87.0 85.9 85.9
    horse 84.6 84.5 84.3 85.6 85.6
    motorbike 77.5 79.1 78.9 78.4 78.9
    person 76.7 77.2 78.6 78.9 79.2
    potted plant 38.8 49.6 47.0 49.3 48.2
    sheep 73.6 77.8 74.5 75.7 75.9
    sofa 73.9 73.1 74.0 75.0 76.3
    train 83.0 77.8 77.8 84.5 82.8
    tv 72.6 74.4 75.2 75.3 76.9
    下载: 导出CSV

    表  6  网络的不同加权融合方式在PASCAL VOC 2007数据集上的实验结果

    Table  6  Results of different weighted fusion methods for networks on PASCAL VOC 2007 dataset

    特征融合方式 加权比率 mAP (%)
    LeakyReLU (βXsemantic + (1-β) Xspatial) β:1-β 76.5
    LeakyReLU (Xsemantic || Xspatial) 1:0.5 75.9
    0.5:1 74.2
    1:1 76.6
    LeakyReLU (Xsemantic || Xspatial || Xcnn) 1:0.5:0.5 74.1
    1:0.5:1 74.5
    1:1:0.5 73.5
    1:1:1 74.3
    0.5:0.5:1 73.4
    0.5:1:0.5 72.8
    0.5:1:1 73.0
    下载: 导出CSV
  • [1] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2012. 1097−1105
    [2] Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, USA: IEEE Computer Society, 2014. 580−587
    [3] Girshick R. Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision. Piscataway, USA: IEEE, 2015. 1440−1448
    [4] Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2015. 91−99
    [5] Bell S, Lawrence Zitnick C, Bala K, Girshick R. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, USA: IEEE Computer Society, 2016. 2874−2883
    [6] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y. SSD: Single shot multibox detector. In: Proceedings of European Conference on Computer Vision. Cham, Germany: Springer, 2016. 21−37
    [7] Redmon, Joseph, and Ali Farhadi. YOLOv3: An incremental improvement. arXiv preprint arXiv: 1804.02767, 2018
    [8] Divvala S K, Hoiem D, Hays J, Efros A, Hebert M. An empirical study of context in object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, USA: IEEE Computer Society, 2009. 1271−1278
    [9] Hu H, Gu J, Zhang Z, Dai J, Wei Y C. Relation networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, USA: IEEE Computer Society, 2018. 3588−3597
    [10] Chen X, Gupta A. Spatial memory for context reasoning in object detection. In: Proceedings of the IEEE International Conference on Computer Vision. Piscataway, USA: IEEE, 2017. 4086−4096
    [11] Liu Y, Wang R, Shan S, Chen X. Structure inference net: Object detection using scene-level context and instance-level relationships. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, USA: IEEE Computer Society, 2018. 6985−6994
    [12] Reed S, Akata Z, Lee H, Schiele B. Learning deep representations of fine-grained visual descriptions. In: Proceedings of the IEEE International Conference on Computer Vision. Piscataway, USA: IEEE, 2016. 49−58
    [13] Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv: 1609.02907, 2016
    [14] Velickovic P, Cucurull G, Casanova A. Graph attention networks. arXiv preprint arXiv: 1710.10903, 2017
    [15] Everingham M, Van Gool L, Williams C K I, Winn J M, Zisserman A. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 2010, 88(2): 303−338 doi: 10.1007/s11263-009-0275-4
    [16] Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D. Microsoft COCO: Common objects in context. In: Proceedings of European Conference on Computer Vision. Cham, Germany: Springer, 2014. 740−755
    [17] Cao J, Pang Y, Li X. Triply supervised decoder networks for joint detection and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, USA: IEEE Computer Society, 2019. 7392−7401
    [18] Zhou X, Wang D, Krahenbuhl P. Objects as points. arXiv preprint arXiv: 1904.07850, 2019
    [19] Zhang S, Wen L, Bian X, Lei Z, Li S Z. Single-shot refinement neural network for object detection. In: Proceedings of the Computer Vision and Pattern Recognition. Los Alamitos, USA: IEEE Computer Society, 2018. 4203−4212
    [20] Lin T, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. In: Proceedings of the International Conference on Computer Vision. Piscataway, USA: IEEE, 2017. 2999−3007
    [21] Law H, Deng J. CornerNet: Detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision. Cham, Germany: Springer, 2018. 765−781
    [22] Zeng X, Ouyang W, Yang B, Yan J, Wang X. Gated bi-directional CNN for object detection. In: Proceedings of the European Conference on Computer Vision. Cham, Germany: Springer, 2016. 354−369
    [23] He C, Lai S, Lam K. Improving Object detection with relation graph inference. In: Proceedings of the International Conference on Acoustics Speech and Signal Processing. Piscataway, USA: IEEE, 2019. 2537−2541
    [24] Fang Y, Kuan K, Lin J, Tan C, Candrasekhar V. Object detection meets knowledge graphs. In: Proceedings of the International Joint Conference on Artificial Intelligence. San Francisco, CA: Morgan Kaufmann, 2017. 1661−1667
    [25] Jiang C, Xu H, Liang X, Lin L. Hybrid knowledge routed modules for large-scale object detection. In: Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2018. 1552−1563
    [26] Reed S, Akata Z, Lee H, Schiele B. Learning deep representations of fine-grained visual descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, USA: IEEE Computer Society, 2016. 49−58
    [27] He K, Gkioxari G, Dollár P, Girshick Ross B. Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision. Piscataway, USA: IEEE, 2017. 2961−2969
    [28] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv: 1409.1556, 2014
    [29] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, USA: IEEE Computer Society, 2016. 770−778
    [30] Pennington J, Socher R, Manning C D. GloVe: Global vectors for word representation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2014. 1532−1543
    [31] Ren S, He K, Girshick R, Zhang X, Sun J. Object detection networks on convolutional feature maps. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(7): 1476−1481 doi: 10.1109/TPAMI.2016.2601099
    [32] Kong T, Sun F, Yao A, Liu H, Lu M, Chen, Y. RON: Reverse connection with objectness prior networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, USA: IEEE Computer Society, 2017. 5936−5944
    [33] Shen Z, Liu Z, Li J, Jiang Y G, Chen Y, Xue X. DSOD: Learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE International Conference on Computer Vision. Piscataway, USA: IEEE, 2017. 1919−1927
    [34] Li J, Wei Y, Liang X, Dong J, Xu T, Fe ng, J, Yan S. Attentive contexts for object detection. IEEE Transactions on Multimedia, 2017, 19(5): 944−954 doi: 10.1109/TMM.2016.2642789
    [35] Ma L, Kan X, Xiao Q, Liu W, Sun P. Yes-Net: An effective detector based on global information. arXiv preprint arXiv: 1706.09180, 2017
    [36] Lin T Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S J. Feature pyramid networks for object detection. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. Los Alamitos, USA: IEEE Computer Society, 2017. 936−944
    [37] Fu C, Liu W, Ranga A, Tyagi A, Berg Alex C. DSSD: Deconvolutional single shot detector. arXiv preprint arXiv: 1701.06659, 2017
    [38] Kong T, Sun F, Huang W, Liu H. Deep feature pyramid reconfiguration for object detection. In: Proceedings of the European Conference on Computer Vision. Cham, Germany: Springer, 2018. 172−188
    [39] Du X, Shi X, Huang R. RepGN: Object detection with relational proposal graph network. arXiv preprint arXiv: 1904.08959, 2019
    [40] 李阳, 王璞, 刘扬, 刘国军, 王春宇, 刘晓燕, 郭茂祖. 基于显著图的弱监督实时目标检测. 自动化学报, 2020, 46(2): 242−255

    Li Yang, Wang Pu, Liu Yang, Liu Guo-Jun, Wang Chun-Yu, Liu Xiao-Yan, Guo Mao-Zu. Weakly supervised real-time object detection based on saliency map. Acta Automatica Sinica, 2020, 46(2): 242−255
    [41] 周波, 李俊峰. 结合目标检测的人体行为识别. 自动化学报, 2020, 46(9): 1961−1970

    Zhou Bo, LI Jun-Feng. Human Action Recognition combined with object detection. Acta Automatica Sinica, 2020, 46(9): 1961−1970
    [42] 王松涛, 周真, 靳薇, 曲寒冰. 基于贝叶斯框架融合的RGB-D图像显著性检测. 自动化学报, 2020, 46(4): 695−720

    Wang Song-Tao, Zhou Zhen, Jin Wei, Qu Han-Bing. Saliency detection for RGB-D images under Bayesian framework. Acta Automatica Sinica, 2020, 46(4): 695−720
    [43] 李阳, 王璞, 刘扬, 刘国军, 王春宇, 刘晓燕, 郭茂祖. 基于显著图的弱监督实时目标检测. 自动化学报, 2020, 46(2): 242−255

    Li Yang, Wang Pu, Liu Yang, Liu Guo-Jun, Wang Chun-Yu, Liu Xiao-Yan, Guo Mao-Zu. Weakly supervised real-time object detection based on saliency map. Acta Automatica Sinica, 2020, 46(2): 242−255
  • 加载中
计量
  • 文章访问数:  71
  • HTML全文浏览量:  37
  • 被引次数: 0
出版历程
  • 网络出版日期:  2020-12-07

目录

    /

    返回文章
    返回