2.793

2018影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

全景分割研究综述

徐鹏斌 瞿安国 王坤峰 李大字

徐鹏斌, 瞿安国, 王坤峰, 李大字. 全景分割研究综述. 自动化学报, 2020, 41(x): 1−20 doi: 10.16383/j.aas.c200657
引用本文: 徐鹏斌, 瞿安国, 王坤峰, 李大字. 全景分割研究综述. 自动化学报, 2020, 41(x): 1−20 doi: 10.16383/j.aas.c200657
Xu Peng-Bin, Qu An-Guo, Wang Kun-Feng, Li Da-Zi. A survey of panoptic segmentation methods. Acta Automatica Sinica, 2020, 41(x): 1−20 doi: 10.16383/j.aas.c200657
Citation: Xu Peng-Bin, Qu An-Guo, Wang Kun-Feng, Li Da-Zi. A survey of panoptic segmentation methods. Acta Automatica Sinica, 2020, 41(x): 1−20 doi: 10.16383/j.aas.c200657

全景分割研究综述

doi: 10.16383/j.aas.c200657
基金项目: 国家自然科学基金(62076020, 61873022), 北京市自然科学基金(4182045)资助
详细信息
    作者简介:

    徐鹏斌:北京化工大学信息科学与技术学院硕士研究生. 2019年获得华北电力大学学士学位. 主要研究方向为深度学习, 计算机视觉, 图像全景分割. E-mail: 2019210488@mail.buct.edu.cn

    瞿安国:北京化工大学信息科学与技术学院硕士研究生. 2018年获得北京理工大学学士学位. 主要研究方向为深度学习, 计算机视觉, 图像全景分割. E-mail: 2018210472@mail.buct.edu.cn

    王坤峰:北京化工大学信息科学与技术学院教授. 主要研究方向为计算机视觉, 机器学习, 智能无人系统. 本文通信作者. E-mail: wangkf@mail.buct.edu.cn

    李大字:北京化工大学信息科学与技术学院教授. 主要研究方向为人工智能、先进控制、分数阶系统、复杂系统建模与优化. E-mail: lidz@mail.buct.edu.cn

A Survey of Panoptic Segmentation Methods

Funds: Supported by National Natural Science Foundation of China (62076020, 61873022) and Beijing Municipal Natural Science Foundation (4182045)
  • 摘要: 在计算机视觉领域, 全景分割是一个新颖且重要的研究问题, 它是机器感知、自动驾驶等新兴前沿技术的基石, 有着十分重要的研究意义. 本文综述了基于深度学习的全景分割研究的最新进展, 总结了全景分割任务的基本处理流程, 并对已发表的全景分割工作基于其网络结构特点进行分类, 并作了全面的介绍与分析, 最后对全景分割任务目前面临的问题以及未来的发展趋势做出了分析, 并针对所面临的问题提出了一些切实可行的解决思路.
  • 图  1  全景分割流程图

    Fig.  1  The processing flow of Panoptic Segmentation

    图  2  LeNet-5的网络结构[27]

    Fig.  2  The structure of LeNet-5

    图  3  VGGNet-16的网络结构

    Fig.  3  The structure of VGG-16

    图  4  ResNet网络的残差模块[14]

    Fig.  4  The residual module of ResNet

    图  5  语义分割处理流程

    Fig.  5  The processing flow of semantic segmentation

    图  6  BlitzNet网络的基本结构[50]

    Fig.  6  The structure of BlitzNet

    图  7  DeeperLab全景分割结构[52]

    Fig.  7  The structure of DeeperLab panoptic segmentation

    图  8  Panoptic-DeepLab模型结构[57]

    Fig.  8  The structure of Panoptic-DeepLab model

    图  9  BBFNet模型结构[62]

    Fig.  9  The structure of BBFNet model

    图  10  PCV模型结构[63]

    Fig.  10  The structure of PCV model

    图  11  轴注意力模型结构[64]

    Fig.  11  The structure of Axial-Attention model

    图  12  TASCNet模型结构[10]

    Fig.  12  The structure of TASCNet model

    图  13  学习实例遮挡网络结构[68]

    Fig.  13  The structure of Learning Instance Occlusion for Panoptic Segmentation

    图  14  SOGNet模型结构[29]

    Fig.  14  The structure of SOGNet model

    图  15  BCRF网络结构[24]

    Fig.  15  The structure of BCRF network

    图  16  EfficientPS模型结构[76]

    Fig.  16  The structure of EfficientPS model

    图  17  BANet模型结构[73]

    Fig.  17  The structure of BANet model

    图  18  PanopticTrackNet模型结构[78]

    Fig.  18  The structure of PanopticTrackNet model

    表  1  现有单阶段方法性能比较

    Table  1  Performance comparison of existing single-stage methods

    方法模型数据集PQmIoUAPmAPInference Time
    单阶段BlitzNet[50]Pascal VOC83.824 FPS
    DeeperLab[52]Mapillary Vistas validation set31.95
    Generator evaluator-selector net[53]MS COCO33.7
    FPSNet[27]CityScapes validation set55.1114 ms
    SpatialFlow[55]COCO 2017 test-dev split47.336.7
    Single-Shot Panoptic Segmentation[28]COCO val201732.427.933.121.8 FPS
    Panoptic-DeepLab[57]COCO val set39.7132 ms
    Real-Time Panoptic Segmentation from Dense Detections[56]COCO val set37.163 ms
    BBFNet[58]Microsoft COCO-2017 dataset37.1
    Axial-DeepLab[59]Cityscapes test set62.879.934.0
    EPSNet[62]COCO val set38.653 ms
    PCV[63]Cityscapes val set54.274.1182.8 ms
    下载: 导出CSV

    表  2  现有两阶段方法性能对比

    Table  2  Performance comparison of existing two-stage methods

    方法模型数据集PQmIoUAPmAPInference Time
    两阶段Weakly- and Semi-Supervised
    Panoptic Segmentation[23]
    VOC 2012 validation set63.159.5
    JSIS-Net[9]COCO test-dev27.2
    TASCNet[10]Cityscapes60.478.739.09
    AUNet[11]Cityscapes val set59.075.634.4
    Panoptic Feature Pyramid Networks[26]COCO test-dev40.9
    UPSNet[25]MS COCO42.554.334.3171 ms
    Single Network Panoptic Segmentation for
    Street Scene Understanding[12]
    Mapillary Vistas23.9484 ms
    OANet[13]COCO 2018 panoptic segmentation
    challenge test-dev
    41.3
    OCFusion[53]MS-COCO test-dev dataset46.7
    SOGNet[29]MS COCO43.7
    PanDA[69]MS COCO subsets37.445.928.0
    BCRF[24]Pascal VOC dataset71.76
    Unifying Training and Inference for
    Panoptic Segmentation[70]
    COCO test-dev set47.2
    BANet[77]COCO val set41.1
    EfficientPS[76]Cityscapes validation set63.679.337.4166 ms
    下载: 导出CSV
  • [1] Kirillov A, He K, Girshick R, et al. Panoptic Segmentation. In: Proceedings of the Conference on Computer Vision and Pattern Recognition. Los Angeles CA, United States: IEEE: 2019.9404−9413.
    [2] 胡涛, 李卫华, 秦先祥. 图像语义分割方法综述. 测控技术, 2019, 38(7): 8−12

    Hu Tao, Li Wei-Hua, Qin Xian-Xiang. Review of image semantic segmentation methods. Measurement and Control Technology, 2019, 38(7): 8−12
    [3] Yang Y, Hallman S, Ramanan D, et al. Layered Object Models for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(9): 1731−1743 doi: 10.1109/TPAMI.2011.208
    [4] Ladický L, Russell C, Kohli P, et al. Associative hierarchical CRFs for object class image segmentation. 2009 IEEE 12th International Conference on Computer Vision. IEEE, 2009: 739−746.
    [5] Shelhamer E, Long J, Darrell T. Fully Convolutional Networks for Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640−651 doi: 10.1109/TPAMI.2016.2572683
    [6] Hariharan B, Arbeláez, Pablo, Girshick R, et al. Simultaneous Detection and Segmentation. European Conference on Computer Vision. Springer, Cham, 2014: 297−312.
    [7] Pinheiro P O, Collobert R, Dollar P. Learning to Segment Object Candidates. Advances in Neural Information Processing Systems. 2015: 1990−1998.
    [8] Zagoruyko S, Lerer A, Lin T Y, et al. A MultiPath Network for Object Detection. [Online], available: https://arxiv.org/abs/1604.02135, April 4, 2016.
    [9] De Geus D, Meletis P, Dubbelman G. Panoptic Segmentation with a Joint Semantic and Instance Segmentation Network. arXiv: 1809.02110, 2018.
    [10] Li J, Raventos A, Bhargava A, et al. Learning to Fuse Things and Stuff. arXiv: 1812.01192, 2018.
    [11] Li Y, Chen X, Zhu Z, et al. Attention-guided Unified Network for Panoptic Segmentation. In: Proceedings of the Conference on Computer Vision and Pattern Recognition. Los Angeles CA, United States: IEEE, 2019.7026−7035.
    [12] De Geus D, Meletis P, Dubbelman G, et al. Single Network Panoptic Segmentation for Street Scene Understanding. In: Proceedings of IEEE Intelligent Vehicles Symposium (IV). Paris, France: IEEE, 2019.709−715.
    [13] Liu H, Peng C, Yu C, et al. An End-To-End Network for Panoptic Segmentation. IEEE Conference on Computer Vision and Pattern Recognition, 2019: 6172−6181.
    [14] Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition, 2014.
    [15] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770−778.
    [16] Huang G, Liu Z, et al. Densely Connected Convolutional Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. p. 4700−4708.
    [17] Howard A G, Zhu M, Chen B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv: 1704.04861, 2017.
    [18] Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 4510−4520.
    [19] Howard A, Sandler M, Chu G, et al. Searching for MobileNetV3. Proceedings of the IEEE International Conference on Computer Vision. 2019: 1314−1324.
    [20] 肖晓伟, 肖迪, 林锦国, 肖玉峰. 多目标优化问题的研究概述. 计算机应用研究, 2011, 28(03): 805−808, 827 doi: 10.3969/j.issn.1001-3695.2011.03.002

    Xiao Xiao-Wei, Xiao Di, Lin Jin-Guo, Xiao Yu-Feng. Overview on multi-objective optimization problem research. Application Research of Computers, 2011, 28(03): 805−808, 827 doi: 10.3969/j.issn.1001-3695.2011.03.002
    [21] 葛继科, 邱玉辉, 吴春明, 蒲国林. 遗传算法研究综述. 计算机应用研究, 2008, (10): 2911−2916 doi: 10.3969/j.issn.1001-3695.2008.10.008

    Ge Ji-Ke, Qiu Yu-Hui, Wu Chun-Ming, Pu Guo-Lin. Summary of genetic algorithms research. Application Research of Computers, 2008, (10): 2911−2916 doi: 10.3969/j.issn.1001-3695.2008.10.008
    [22] 巫影, 陈定方, 唐小兵, 朱石坚, 黄映云, 李庆. 神经网络综述. 科技进步与对策, 2002, 6: 133−134 doi: 10.3969/j.issn.1001-7348.2002.04.058

    Wu Ying, Chen Ding-Fang, Tang Xiao-Bing, Zhu Shi-Jian, Huang Ying-Yun, Li Qing. Summarizing of neural network. Technological Progress and Countermeasures, 2002, 6: 133−134 doi: 10.3969/j.issn.1001-7348.2002.04.058
    [23] Li Q, Arnab A, Torr P H S. Weakly- and Semi-Supervised Panoptic Segmentation. Proceedings of the European Conference on Computer Vision. 2018: 102−118.
    [24] Jayasumana S, Ranasinghe K, Jayawardhana M, et al. Bipartite Conditional Random Fields for Panoptic Segmentation. arXiv: 1912.05307, 2019.
    [25] Xiong Y, Liao R, Zhao H, et al. UPSNet: A Unified Panoptic Segmentation Network. IEEE Conference on Computer Vision and Pattern Recognition, 2019: 8818−8826.
    [26] Kirillov A, Girshick R, He K, et al. Panoptic Feature Pyramid Networks. IEEE Conference on Computer Vision and Pattern Recognition, 2019: 6399−6408.
    [27] De Geus D, Meletis P, Dubbelman G, et al. Fast Panoptic Segmentation Network. IEEE Robotics and Automation Letters, 2020, 5(2): 1742−1749 doi: 10.1109/LRA.2020.2969919
    [28] Weber M, Luiten J, Leibe B, et al. Single-Shot Panoptic Segmentation. arXiv: 1911.00764, 2019.
    [29] Yang Y, Li H, Li X, et al. SOGNet: Scene Overlap Graph Network for Panoptic Segmentation. arXiv: 1911.07527, 2019.
    [30] Kunihiko Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 1980, 36(4)
    [31] P. Haffner, L. Bottou, Y. Bengio, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278−2324 doi: 10.1109/5.726791
    [32] Krizhevsky A, Sutskever I, Hinton G E. ImageNet Classification with Deep Convolutional Neural Networks. Communications of the ACM, 2017, 60(6): 84−90 doi: 10.1145/3065386
    [33] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions. IEEE Conference on Computer Vision and Pattern Recognition, 2015: 1−9.
    [34] Chen L, Papandreou G, Kokkinos I, et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. arXiv: 1412.7062, 2014.
    [35] Chen L, Papandreou G, Kokkinos I, et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834−848 doi: 10.1109/TPAMI.2017.2699184
    [36] Chen L, Hermans A, Papandreou G, et al. MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features. IEEE Conference on Computer Vision and Pattern Recognition, 2018: 4013−4022.
    [37] Ronneberger O, Fischer P, Brox T, et al. U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer Assisted Intervention, 2015: 234−241.
    [38] Zhao H, Shi J, Qi X, et al. Pyramid Scene Parsing Network. IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6230−6239.
    [39] He K, Gkioxari G, Dollar P, et al. Mask R-CNN. International Conference on Computer Vision, 2017: 2980−2988.
    [40] Ren S, He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems. 2015: 91−99.
    [41] Liu S, Qi L, Qin H, et al. Path Aggregation Network for Instance Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 8759−8768.
    [42] Zhang H, Tian Y, Wang K, et al. Mask SSD: An Effective Single-Stage Approach to Object Instance Segmentation. IEEE Transactions on Image Processing, 2019, 29: 2078−2093
    [43] Everingham M, Eslami S M, Van Gool L, et al. The Pascal Visual Object Classes Challenge: A Retrospective. International Journal of Computer Vision, 2015, 111(1): 98−136 doi: 10.1007/s11263-014-0733-5
    [44] Lin T, Maire M, Belongie S, et al. Microsoft COCO: Common Objects in Context. European Conference on Computer Vision, 2014: 740−755.
    [45] Cordts M, Omran M, Ramos S, et al. The Cityscapes Dataset for Semantic Urban Scene Understanding. IEEE Conference on Computer Vision and Pattern Recognition, 2016: 3213−3223.
    [46] Zhou B, Zhao H, Puig X, et al. Scene Parsing through ADE20K Dataset. IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2017.
    [47] Neuhold G, Ollmann T, Bulo S R, et al. The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes. IEEE International Conference on Computer Vision, 2017.
    [48] Garcia-Garcia A, Orts-Escolano S, Oprea S, et al. A review on deep learning techniques applied to semantic segmentation. arXiv: 1704.06857, 2017.
    [49] Lateef F, Ruichek Y. Survey on Semantic Segmentation Using Deep Learning Techniques. Neurocomputing, 2019, 338: 321−348 doi: 10.1016/j.neucom.2019.02.003
    [50] Dvornik N, Shmelkov K, Mairal J, et al. BlitzNet: A Real-Time Deep Network for Scene Understanding. Proceedings of the IEEE International Conference on Computer Vision. 2017: 4154−4162.
    [51] Liu W, Anguelov D, Erhan D, et al. SSD: Single Shot MultiBox Detector. European Conference on Computer Vision, 2016: 21−37.
    [52] Yang T, Collins M D, Zhu Y, et al. DeeperLab: Single-Shot Image Parser. arXiv: 1902.05093, 2019.
    [53] Eppel S, Aspuru-Guzik A. Generator Evaluator-Selector Net: A Modular Approach for Panoptic Segmentation. arXiv: 1908.09108, 2019.
    [54] Eppel S. Class-independent Sequential Full Image Segmentation, Using a Convolutional Net That Finds a Segment within an Attention Region, Given a Pointer Pixel within This Segment. arXiv: 1902.07810, 2019.
    [55] Chen Q, Cheng A, He X, et al. SpatialFlow: Bridging All Tasks for Panoptic Segmentation. IEEE Transactions on Circuits and Systems for Video Technology, 2020.
    [56] Hou R, Li J, Bhargava A, et al. Real-Time Panoptic Segmentation from Dense Detections. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, America: IEEE, 2020.8523−8532.
    [57] Cheng B, Collins M D, Zhu Y, et al. Panoptic-DeepLab. arXiv: 1910.04751, 2019.
    [58] Bonde U, Alcantarilla P F, Leutenegger S. Towards Bounding-Box Free Panoptic Segmentation. arXiv: 2002.07705, 2020.
    [59] Dai J, Qi H, Xiong Y, et al. Deformable Convolutional Networks. International Conference on Computer Vision, 2017: 764−773
    [60] Ballard D H. Generalizing the Hough transform to detect arbitrary shapes. Pattern Recognition, 1981, 13(2): 111−122 doi: 10.1016/0031-3203(81)90009-1
    [61] Bai M, Urtasun R. Deep Watershed Transform for Instance Segmentation. IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2858−2866.
    [62] Chang C, Chang S, et al. EPSNet: Efficient Panoptic Segmentation Network with Cross-layer Attention Fusion. arXiv preprint arXiv: 2003.10142, 2020.
    [63] Wang H, Luo R, Maire M, et al. Pixel Consensus Voting for Panoptic Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, America: IEEE, 2020.9464−9473.
    [64] Wang H, Zhu Y, Green B, et al. Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation. arXiv: 2003.07853, 2020.
    [65] Arnab A, Torr P H. Pixelwise Instance Segmentation with a Dynamically Instantiated Network. IEEE Conference on Computer Vision and Pattern Recognition, 2017: 879−888.
    [66] Rother C, Kolmogorov V, Blake A, et al. "GrabCut": interactive foreground extraction using iterated graph cuts. International Conference on Computer Graphics and Interactive Techniques, 2004, 23(3): 309−314
    [67] Arbelaez P, Ponttuset J, Barron J, et al. Multiscale Combinatorial Grouping. IEEE Conference on Computer Vision and Pattern Recognition, 2014: 328−335.
    [68] Lazarow J, Lee K, Tu Z, et al. Learning Instance Occlusion for Panoptic Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, America: IEEE, 2020.10720−10729.
    [69] Liu Y, Perona P, Meister M, et al. PanDA: Panoptic Data Augmentation. arXiv: 1911.12317, 2019.
    [70] Li Q, Qi X, Torr P H, et al. Unifying Training and Inference for Panoptic Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, America: IEEE, 2020.13320−13328.
    [71] Behley J, Milioto A, Stachniss C, et al. A Benchmark for LiDAR-based Panoptic Segmentation based on KITTI. arXiv: 2003.02371, 2020.
    [72] Behley J, Garbade M, Milioto A, et al. SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. International Conference on Computer Vision, 2019: 9297−9307.
    [73] Thomas H, Qi C R, Deschaud J, et al. KPConv: Flexible and Deformable Convolution for Point Clouds. Proceedings of the IEEE International Conference on Computer Vision. 2019: 6411−6420.
    [74] Milioto A, Vizzo I, et al. RangeNet++: Fast and Accurate LiDAR Semantic Segmentation. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2019: 4213−4220.
    [75] Lang A H, Vora S, Caesar H, et al. PointPillars: Fast Encoders for Object Detection from Point Clouds. IEEE Conference on Computer Vision and Pattern Recognition, 2019: 12697−12705.
    [76] Mohan R, Valada A. EfficientPS: Efficient Panoptic Segmentation. arXiv: 2004.02307, 2020.
    [77] Chen Y, Lin G, Li S, et al. BANet: Bidirectional Aggregation Network with Occlusion Handling for Panoptic Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, America: IEEE, 2020.3793−3802.
    [78] Hurtado J, Mohan R, et al. MOPT: Multi-Object Panoptic Tracking. arXiv: 2004.08189, 2020.
    [79] 张慧, 王坤峰, 王飞跃. 深度学习在目标视觉检测中的应用进展与展望. 自动化学报, 2017, 43(8): 1289−1305

    Zhang Hui, Wang Kun-Feng, Wang Fei-Yue. Advances and perspectives on applications of deep learning in visual object detection. Acta Automatica Sinica, 2017, 43(8): 1289−1305
    [80] Meletis P, Wen X, et al. Cityscapes-Panoptic-Parts and PASCAL-Panoptic-Parts datasets for Scene Understanding. arXiv: 2004.07944, 2020.
  • 加载中
计量
  • 文章访问数:  77
  • HTML全文浏览量:  40
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-09-23
  • 录用日期:  2020-10-10
  • 网络出版日期:  2020-12-28

目录

    /

    返回文章
    返回