2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于上下文和浅层空间编解码网络的图像语义分割方法

罗会兰 黎宵

罗会兰, 黎宵. 基于上下文和浅层空间编解码网络的图像语义分割方法. 自动化学报, 2022, 48(7): 1834−1846 doi: 10.16383/j.aas.c190372
引用本文: 罗会兰, 黎宵. 基于上下文和浅层空间编解码网络的图像语义分割方法. 自动化学报, 2022, 48(7): 1834−1846 doi: 10.16383/j.aas.c190372
Luo Hui-Lan, Li Xiao. Image semantic segmentation method based on context and shallow space encoder-decoder network. Acta Automatica Sinica, 2022, 48(7): 1834−1846 doi: 10.16383/j.aas.c190372
Citation: Luo Hui-Lan, Li Xiao. Image semantic segmentation method based on context and shallow space encoder-decoder network. Acta Automatica Sinica, 2022, 48(7): 1834−1846 doi: 10.16383/j.aas.c190372

基于上下文和浅层空间编解码网络的图像语义分割方法

doi: 10.16383/j.aas.c190372
基金项目: 国家自然科学基金(61862031, 61462035), 江西省自然科学基金(20171BAB202014), 江西省主要学科学术和技术带头人培养计划 (20213BCJ22004)资助
详细信息
    作者简介:

    罗会兰:江西理工大学信息工程学院教授. 2008年获浙江大学计算机科学与技术博士学位. 主要研究方向为计算机视觉与机器学习. 本文通信作者. E-mail: luohuilan@sina.com

    黎宵:江西理工大学信息工程学院硕士研究生. 主要研究方向为计算机视觉与语义分割. E-mail: williamlixiao@sina.com

Image Semantic Segmentation Method Based on Context and Shallow Space Encoder-decoder Network

Funds: Supported by National Natural Science Foundation of China (61862031, 61462035), Natural Science Foundation of Jiangxi Pro-vince (20171BAB202014), and Training Plan for Academic and Technical Leaders of Major Disciplines in Jiangxi Province (20213BCJ22004 )
More Information
    Author Bio:

    LUO Hui-Lan Professor at the School of Information Engineering, Jiangxi University of Science and Technology. She received her Ph.D. degree in computer science and technology from Zhejiang University in 2008. Her research interest covers computer vision and machine learning. Corresponding author of this paper

    LI Xiao Master student at the School of Information Engineering, Jiangxi University of Science and Technology. His research interest covers computer vision and semantic segmentation

  • 摘要: 当前图像语义分割研究基本围绕如何提取有效的语义上下文信息和还原空间细节信息两个因素来设计更有效算法. 现有的语义分割模型, 有的采用全卷积网络结构以获取有效的语义上下文信息, 而忽视了网络浅层的空间细节信息; 有的采用U型结构, 通过复杂的网络连接利用编码端的空间细节信息, 但没有获取高质量的语义上下文特征. 针对此问题, 本文提出了一种新的基于上下文和浅层空间编解码网络的语义分割解决方案. 在编码端, 采用二分支策略, 其中上下文分支设计了一个新的语义上下文模块来获取高质量的语义上下文信息, 而空间分支设计成反U型结构, 并结合链式反置残差模块, 在保留空间细节信息的同时提升语义信息. 在解码端, 本文设计了优化模块对融合后的上下文信息与空间信息进一步优化. 所提出的方法在3个基准数据集CamVid、SUN RGB-D和Cityscapes上取得了有竞争力的结果.
  • 图  1  本文提出的网络结构与其他网络结构

    Fig.  1  The network structures of our method and other methods

    图  2  本文网络框架(HAB: 混合扩张卷积模块; RPB: 残差金字塔特征提取模块; CRB: 链式残差模块;RRB: 残差循环卷积模块; Deconv: 转置卷积; R: 扩张率)

    Fig.  2  The network framework of our method (HAB: hybrid atrous convolution block; RPB: residual pyramid feature block; CRB: chain inverted residual block; RRB: residual recurrent convolution block; Deconv: transposed convolution; R: atrous rate)

    图  3  ResNet-34骨干网络结构

    Fig.  3  The backbone structure of ResNet-34

    图  4  3种不同扩张率的扩张卷积, 从左到右分别为r = 1, 3, 4

    Fig.  4  Illustrations of the atrous convolution with three different atrous rates, r = 1, 3, 4

    图  5  混合扩张卷积模块

    Fig.  5  Hybrid atrous convolution block

    图  6  残差金字塔特征提取模块

    Fig.  6  Residual pyramid feature block

    图  7  链式反置残差模块

    Fig.  7  Chain inverted residual block

    图  8  残差循环卷积模块

    Fig.  8  Residual recurrent convolution block

    图  9  在Camvid测试集上本文方法与SegNet[12]、CGNet[23]和BiSeNet (xception)[14]方法的定性比较

    Fig.  9  Qualitative comparisons with SegNet[12], CGNet[23] and BiSeNet (xception)[14] on the CamVid test set

    图  10  本文方法在SUN RGB-D测试集上的定性结果

    Fig.  10  Qualitative results of our method on the SUN RGB-D test set

    图  11  本文方法在Cityscapes验证集上的定性结果

    Fig.  11  Qualitative results of our method on the Cityscapes val set

    表  1  本文方法与其他方法在CamVid测试集上的MIoU比较(%)

    Table  1  Comparison of MIoU between our method and the state-of-the-art methods on the CamVid test set (%)

    方法TreeSkyBuildingCarSignRoadPedestrianFencePoleSidewalkBicyclistMIoU
    FCN-8[1]52.0
    DeconvNet[8]48.9
    SegNet[12]52.087.068.758.513.486.225.317.916.060.524.850.2
    ENet[7]77.895.174.782.451.095.167.251.735.486.734.151.3
    Dilation[2]76.289.982.684.046.992.256.335.823.475.355.565.29
    LRN[10]73.676.478.675.240.191.743.541.030.480.146.561.7
    FC-DenseNet103[11]77.393.083.077.343.994.559.637.137.882.250.566.9
    G-FRNet[18]76.892.182.581.843.094.554.647.133.482.359.468.0
    BiSeNet (xception)[14]74.491.982.280.842.893.353.849.731.981.454.065.6
    CGNet[23]65.6
    本文方法75.892.481.982.243.394.359.042.337.380.261.368.26
    下载: 导出CSV

    表  2  在CamVid数据集上的性能比较

    Table  2  Performance comparisons of our method and the state-of-the-art methods on the CamVid dataset

    方法参数量
    (MB)
    运行时间
    (ms)
    帧速率
    (帧/s)
    MIoU (%)
    SegNet[12]2923.24250.2
    BiSeNet (xception)[14]5.812.18265.6
    BiSeNet (ResNet18)[14]4964.81568.7
    本文方法3139.52568.26
    下载: 导出CSV

    表  3  本文方法与其他方法在SUN RGB-D测试集上的MIoU比较 (%)

    Table  3  Comparison of MIoU between our method and the state-of-the-art methods on the SUN RGB-D test set (%)

    方法MIoU
    FCN-8[1]27.4
    DeconvNet[8]22.6
    ENet[7]19.7
    SegNet[12]31.8
    DeepLab[4]32.1
    本文方法40.79
    下载: 导出CSV

    表  4  本文方法与其他方法在Cityscapes测试集上的比较

    Table  4  Comparisons of our method with the state-of-the-art methods on the Cityscapes test set

    方法参数量 (MB)MIoU (%)
    FCN-8[1]134.565.3
    ENet[7]0.458.3
    SegNet[12]29.556.1
    DeepLab[4]44.0470.4
    Dilation[2]67.1
    PSPNet[5]65.778.4
    CGNet[23]0.564.8
    BiSeNet (xception)[14]5.868.4
    BiSeNet (ResNet18)[14]4974.7
    本文方法3173.1
    下载: 导出CSV

    表  5  混合扩张卷积和残差金字塔特征提取模块对性能的影响 (HAB: 混合扩张卷积模块; RPB: 残差金字塔特征提取模块)

    Table  5  The influence of HAB and RPB on performance (HAB: hybrid atrous convolution block; RPB: residual pyramid feature block)

    HABRPBMIoU (%)
    ××66.57
    ×66.22
    ×67.51
    68.26
    下载: 导出CSV

    表  6  混合扩张卷积模块对性能的影响 (HAB: 混合扩张卷积模块; RPB: 残差金字塔特征提取模块)

    Table  6  The influence of HAB on performance (HAB: hybrid atrous convolution block; RPB: residual pyramid feature block)

    HABRPBHAB各分支扩张率MIoU (%)
    ×2, 3, 466.22
    ×167.84
    168.16
    2, 3, 468.26
    下载: 导出CSV

    表  7  混合扩张卷积模块与残差金字塔特征提取模块的结构顺序对性能的影响 (HAB: 混合扩张卷积模块; RPB: 残差金字塔特征提取模块)

    Table  7  The influence of the structural order of HAB and RPB on performance (HAB: hybrid atrous convolution block; RPB: residual pyramid feature block)

    方法MIoU (%)
    HAB+HAB67.29
    RPB+RPB66.95
    RPB+HAB68.29
    本文 (HAB+RPB)68.26
    下载: 导出CSV

    表  8  不同空间路径对性能的影响 (CRB: 链式反置残差模块; SP: 空间路径; LFP: 浅层特征作为空间路径; HFP: 高层特征作为空间路径; RUP: 反U型空间路径)

    Table  8  The influence of different spatial paths on performance (CRB: chain inverted residual block; SP: spatial path; LFP: low-level feature as spatial path; HFP: high-level feature as spatial path; RUP: reverse u-shaped spatial path)

    方法MIoU (%)
    No SP63.51
    LFP66.06
    HFP67.49
    RUP66.79
    RUP+CRB68.26
    下载: 导出CSV

    表  9  链式反置残差模块不同链长对性能的影响

    Table  9  The influence of CRB chain length on performance

    方法MIoU (%)
    各路径链长均为 167.20
    各路径链长均为 367.25
    本文 (分别为 3, 2, 1)68.26
    下载: 导出CSV

    表  10  残差循环卷积模块对性能的影响

    Table  10  The influence of RRB on performance

    方法MIoU (%)
    使用优化模块68.26
    不使用优化模块67.50
    下载: 导出CSV
  • [1] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2017, 39(4): 640-651 doi: 10.1109/TPAMI.2016.2572683
    [2] Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. In: Proceedings of the 4th International Conference on Learning Representations (ICLR). San Juan, Puerto Rico, USA: Conference Track Proceedings, 2016.
    [3] Yu F, Koltun V, Funkhouser T. Dilated residual networks. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, Hawaii, USA: IEEE, 2017. 472−480
    [4] Chen L C, Papandreou G, Kokkinos I, Murphy K, Yuille A L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution and fully connected CRFS. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848 doi: 10.1109/TPAMI.2017.2699184
    [5] Zhao H S, Shi J P, Qi X J, Wang X G, Jia J Y. Pyramid scene parsing network. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, Hawaii, USA: IEEE, 2017. 2881−2890
    [6] Chen L C, Papandreou G, Schroff F, Adam H. Rethinking atrous convolution for semantic image segmentation. [Online], available: https://arxiv.org/abs/1706.05587v1, Jun 17, 2017
    [7] Paszke A, Chaurasia A, Kim S, Culurciello E. Enet: A deep neural network architecture for real-time semantic segmentation. [Online], available: https://arxiv.org/abs/1606.02147, Jun 7, 2016
    [8] Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 1520−1528
    [9] 黄庭鸿, 聂卓赟, 王庆国, 李帅, 晏来成, 郭东生. 基于区块自适应特征融合的图像实时语义分割. 自动化学报, 2021, 47(5): 1137−1148

    Huang Ting-Hong, Nie Zhuo-Yun, Wang Qing-Guo, Li Shuai, Yan Lai-Cheng, Guo Dong-Sheng. Real-time image semantic segmentation based on block adaptive feature fusion. Acta Automatica Sinica, 2021, 47(5): 1137−1148
    [10] Islam M A, Naha S, Rochan M, Bruce N, Wang Y. Label refinement network for coarse-to-fine semantic segmentation. [Online], available: https://arxiv.org/abs/1703.00551v1, Mar 1, 2017
    [11] Jégou S, Drozdzal M, Vazquez D, Romero A, Bengio Y. The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Honolulu, Hawaii, USA: IEEE, 2017. 11−19
    [12] Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder-decoder architecture for scene segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495 doi: 10.1109/TPAMI.2016.2644615
    [13] Zeiler M D, Fergus R. Visualizing and understanding convolutional networks. In: Proceedings of the 2014 European Conference on Computer Vision (ECCV). Zürich, Switzerland: Springer, 2014. 818−833
    [14] Yu C Q, Wang J B, Peng C, Gao C X, Yu G, Sang N. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the 2018 European Conference on Computer Vision (ECCV). Munich, Germany: Springer, 2018. 325−341
    [15] Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L C. Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Salt lake, Utah, USA: IEEE, 2018. 4510−4520
    [16] Alom M Z, Hasan M, Yakopcic C, Taha T M, Asari V K. Recurrent residual convolutional neural network based on U-net (R2U-Net) for medical image segmentation. [Online], available: https://arxiv.org/abs/1802.06955v1, Feb 20, 2018
    [17] Chaurasia A, Culurciello E. Linknet: Exploiting encoder representations for efficient semantic segmentation. In: Proceedings of the 2017 IEEE Visual Communications and Image Processing (VCIP). Saint Petersburg, Florida, USA: IEEE, 2017. 1−4
    [18] Amirul Islam M, Rochan M, Bruce N D, Wang Y. Gated feedback refinement network for dense image labeling. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, Hawaii, USA: IEEE, 2017. 3751−3759
    [19] Poudel R P, Bonde U, Liwicki S, Zach C. Contextnet: Exploring context and detail for semantic segmentation in real-time. In: Proceedings of the 2018 British Machine Vision Conference (BMVC). Northumbria University, Newcastle, UK: BMVA, 2018. 146
    [20] Poudel R P, Liwicki S, Cipolla R. Fast-SCNN: Fast semantic segmentation network. [Online], available: https://arxiv.org/abs/1902.04502, Feb 12, 2019
    [21] He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 770−778
    [22] Deng J, Dong W, Socher R, Li J A, Li K, Li F F. Imagenet: A large-scale hierarchical image database. In: Proceedings of the 2009 IEEE conference on Computer Vision and Pattern Recognition (CVPR), Florida, USA: IEEE, 2009. 248−255
    [23] Wu T Y, Tang S, Zhang R, Zhang Y D. CGNet: A light-weight context guided network for semantic segmentation. [Online], available: https://arxiv.org/abs/1811.08201v1, Nov 20, 2018
    [24] Szegedy C, Ioffe S, Vanhoucke V, Alemi A A. Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI). San Francisco, California, USA: AAAI, 2017. 4278−4284
    [25] Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 2015 International Conference on Machine Learning (ICML). Lille, France: PMLR, 2015. 448−456
    [26] Brostow G J, Fauqueur J, Cipolla R. Semantic object classes in video: A high-definition ground truth database. Pattern Recognition Letters, 2009, 30(2): 88-97 doi: 10.1016/j.patrec.2008.04.005
    [27] Song S R, Lichtenberg S P, Xiao J X. Sun RGB-D: A RGB-D scene understanding benchmark suite. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, Massachusetts, USA: IEEE, 2015. 567−576
    [28] Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, et al. The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016. 3213−3223
    [29] He K M, Zhang X Y, Ren S Q, Sun J. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile, USA: IEEE, 2015. 1026−1034
    [30] Kingma D P, Ba J. Adam: A method for stochastic optimization. In: Proceedings of the 4th International Conference on Learning Representations ICLR). San Diego, CA, USA: Conference Track Proceedings, 2015.
  • 加载中
图(11) / 表(10)
计量
  • 文章访问数:  830
  • HTML全文浏览量:  188
  • PDF下载量:  248
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-05-15
  • 录用日期:  2019-10-11
  • 网络出版日期:  2022-06-16
  • 刊出日期:  2022-07-01

目录

    /

    返回文章
    返回