2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于区块自适应特征融合的图像实时语义分割

黄庭鸿 聂卓赟 王庆国 李帅 晏来成 郭东生

黄庭鸿, 聂卓赟, 王庆国, 李帅, 晏来成, 郭东生. 基于区块自适应特征融合的图像实时语义分割.自动化学报, 2021, 47(5): 1137-1148 doi: 10.16383/j.aas.c180645
引用本文: 黄庭鸿, 聂卓赟, 王庆国, 李帅, 晏来成, 郭东生. 基于区块自适应特征融合的图像实时语义分割.自动化学报, 2021, 47(5): 1137-1148 doi: 10.16383/j.aas.c180645
Huang Ting-Hong, Nie Zhuo-Yun, Wang Qing-Guo, Li Shuai, Yan Lai-Cheng, Guo Dong-Sheng. Real-time image semantic segmentation based on block adaptive feature fusion. Acta Automatica Sinica, 2021, 47(5): 1137-1148 doi: 10.16383/j.aas.c180645
Citation: Huang Ting-Hong, Nie Zhuo-Yun, Wang Qing-Guo, Li Shuai, Yan Lai-Cheng, Guo Dong-Sheng. Real-time image semantic segmentation based on block adaptive feature fusion. Acta Automatica Sinica, 2021, 47(5): 1137-1148 doi: 10.16383/j.aas.c180645

基于区块自适应特征融合的图像实时语义分割

doi: 10.16383/j.aas.c180645
基金项目: 

国家自然科学基金 61403149

华侨大学中青年教师科研提升资助计划项目 ZQN-PY408

华侨大学中青年教师科研提升资助计划项目 Z14Y0002

华侨大学研究生科研创新基金 17013082039

详细信息
    作者简介:

    黄庭鸿  华侨大学信息科学与工程学院硕士研究生. 2017年获得华侨大学学士学位. 主要研究方向为强化学习和深度学习.E-mail: 063mi@163.com

    王庆国  南非约翰内斯堡大学智能系统研究所教授, 新加坡国立大学教授. 1987年获得浙江大学博士学位. 主要研究方向为复杂系统的建模, 估计预测、控制和优化.E-mail: wangqg02286@gmail.com

    李帅  香港理工大学研究助理副教授. 2014年获得史蒂文斯理工学院博士学位. 主要研究方向为动态神经网络, 无线传感器网络, 机器人网络, 机器学习和在图上定义的其他动态问题. E-mail: shuaili@polyu.edu.hk

    晏来成  华侨大学信息科学与工程学院讲师. 2007年获得重庆大学硕士学位. 主要研究方向为机器人控制, 机器视觉和机器学习. E-mail: ylaicheng@126.com

    郭东生  华侨大学信息科学与工程学院副教授. 2015年获得中山大学博士学位. 主要研究方向为机器人控制, 神经网络和数值方法.E-mail: gdongsh@hqu.edu.cn

    通讯作者:

    聂卓赟  华侨大学信息科学与工程学院副教授. 2012年获中南大学博士学位. 主要研究方向为鲁棒控制, 系统建模与辨识. 本文通信作者.E-mail: yezhuyun2004@sina.com

Real-time Image Semantic Segmentation Based on Block Adaptive Feature Fusion

Funds: 

National Natural Science Foundation of China 61403149

Promotion Program for Young and Middle-aged Teacher in Science and Technology Research of Huaqiao University ZQN-PY408

Promotion Program for Young and Middle-aged Teacher in Science and Technology Research of Huaqiao University Z14Y0002

Postgraduates' Innovative Fund in Scientific Research of Huaqiao University 17013082039

More Information
    Author Bio:

    HUANG Ting-Hong  Master student at the College of Information Science and Engineering, National Huaqiao University. He received his bachelor degree from National Huaqiao University in 2017. His research interest covers reinforcement learning and deep learning

    WANG Qing-Guo  Professor at the Institute for Intelligent Systems, University of Johannesburg, South Africa, and National University of Singapore, Singapore. He received his Ph.D. degree from Zhejiang University in 1987. His research interest covers modeling, estimation, prediction, control and optimization for complex systems

    LI Shuai  Research assistant professor at the Hong Kong Polytechnic University. He received his Ph.D. degree from Stevens Institute of Technology in 2014. His research interest covers dynamic neural networks, wireless sensor networks, robotic networks, machine learning, and other dynamic problems defined on a graph

    YAN Lai-Cheng  Lecturer at the College of Information Science and Engineering, National Huaqiao University. He received his master degree from Chongqing University in 2007. His research interest covers robot control, machine vision, and machine learning

    GUO Dong-Sheng  Associate professor at the College of Information Science and Engineering, National Huaqiao University. He received his Ph.D. degree from Sun Yat-sen University in 2015. His research interest covers robot control, neural networks, and numerical methods

    Corresponding author: NIE Zhuo-Yun  Associate professor at the College of Information Science and Engineering, National Huaqiao University. He received his Ph.D. degree from Central South University in 2012. His research interest covers robust control and system modeling and identification. Corresponding author of this paper
  • 摘要: 近年来结合深度学习的图像语义分割方法日益发展, 并在机器人、自动驾驶等领域中得到应用. 本文提出一种基于区块自适应特征融合(Block adaptive feature fusion, BAFF) 的实时语义分割算法, 该算法在轻量卷积网络架构上, 对前后文特征进行分区块自适应加权融合, 有效提高了实时语义分割精度. 首先, 分析卷积网络层间分割特征的感受野对分割结果的影响, 并在跳跃连接结构(SkipNet) 上提出一种特征分区块加权融合机制; 然后, 采用三维卷积进行层间特征整合, 建立基于深度可分离的特征权重计算网络. 最终, 在自适应加权作用下实现区块特征融合. 实验结果表明, 本文算法能够在图像分割的快速性和准确性之间做到很好的平衡, 在复杂场景分割上具有较好的鲁棒性.
    Recommended by Associate Editor LIU Cheng-Lin
    1)  本文责任编委 刘成林
  • 图  1  区块特征融合与SkipNet叠加融合对比图

    Fig.  1  The comparison chart of block feature fusion and SkipNet additive fusion

    图  2  编码—解码结构

    Fig.  2  The structure chart of encoding-decoding

    图  3  不同卷积层的语义分割测试

    Fig.  3  The test of semantic segmentation for different convolution layer

    图  4  基于BAFF的语义分割网络结构

    Fig.  4  The structure chart of the semantic segmentation network based on BAFF

    图  5  CNN预测精度与物体大小关系

    Fig.  5  The relation between the precision of CNN prediction and the size of objects

    图  6  区块加权融合效果

    Fig.  6  The effect of block weighted fusion

    图  7  模型训练损失值的变化

    Fig.  7  The loss of value during the model training

    图  8  特征融合的显著图

    Fig.  8  The salient region of the feature fusion

    图  9  模型精确度对比图

    Fig.  9  The comparison chart of model accuracy

    图  10  语义分割效果图对比

    Fig.  10  Semantic segmentation effect contract graph

    表  1  加入BAFF前后的模型复杂度对比

    Table  1  Comparisons of model complexity before and after adding BAFF

    模型MIoU (%) 运算量(M)参数量(K)
    SkipNet66.815 962.99841.76
    BAFF-SkipNet70.515 963.23843.17
    下载: 导出CSV

    表  2  语义分割各类别精度对比(%)

    Table  2  Semantic segmentation accuracy comparison of different types (%)

    模型roadswalkbuild.wallfencepoletlightsignveg.terrainskypersonridercartruckbustrainmbikebike
    本文算法93.079.386.660.265.362.060.264.389.965.092.772.156.089.158.565.155.453.872.3
    ENet[28]96.374.285.032.133.243.434.144.088.661.490.665.538.490.636.950.548.038.855.4
    ContextNet[29]97.679.288.843.842.837.952.058.890.066.991.972.153.991.654.066.458.348.961.0
    ERFNet[30]97.781.089.842.548.056.359.865.391.468.294.276.857.192.850.860.151.847.361.6
    下载: 导出CSV

    表  3  实时语义分割模型精度对比

    Table  3  Accuracy comparison of real-time semantic segmentation models

    模型MIoU (%) 运行时间(ms)参数量(M)
    本文算法70.519.010.82
    ENet[28]58.311.820.37
    ERFNet[30]68.019.642.18
    下载: 导出CSV
  • [1] Rother C, Kolmogorov V, Blake A. GrabCut-interactive foreground extraction using iterated graph cuts. ACM Trans Graphics, 2004, 23(3): 309-314 doi: 10.1145/1015706.1015720
    [2] 夏剑峰. 基于数学形态学的癌细胞的分割与识别. 电子科技, 2016, 29(10): 36-38 doi: 10.3969/j.issn.1009-6108.2016.10.018

    Xia Jian-Feng. Segmentation and recognition of cancer cells based on mathematical morphology. Electronic Science and Technology, 2016, 29(10): 36-38 doi: 10.3969/j.issn.1009-6108.2016.10.018
    [3] He X, Zemel R S, Ray D. Learning and incorporating top-down cues in image segmentation. In: Proceedings of the 9th European Conference on Computer Vision. Graz, Austria: Springer, 2006. 338-351
    [4] Raví D, Bober M, Farinella G M, Guarnera M, Battiato S. Semantic segmentation of images exploiting DCT based features and random forest. Pattern Recognition, 2016, 52(3): 260-273 http://smartsearch.nstl.gov.cn/paper_detail.html?id=2d9ec586cbd26742062209cc11d28290
    [5] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786): 504-507 doi: 10.1126/science.1127647
    [6] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640-651 doi: 10.1109/TPAMI.2016.2572683
    [7] Zeiler M D, Krishnan D, Taylor G W, Fergus R. Deconvolutional networks. In: Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE, 2010. 2528-2535
    [8] Kirkland E J. Bilinear Interpolation. Advanced Computing in Electron Microscopy. Boston MA, USA: Springer-Verlag, 2010. 261-263
    [9] Zhang X Y, Zhou X Y, Lin M X, Sun J. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 6848-6856
    [10] Howard A G, Zhu M L, Chen B, Kalenichenko D, Wang W J, Weyand T, Andreetto M, Adam H. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv: 1704. 04861, 2017.
    [11] Siam M, Gamal M, Abdel-Razek M, Yogamani S, Jagersand M. RTSeg: Real-time semantic segmentation comparative study. In: Proceedings of the 25th IEEE International Conference on Image Processing (ICIP). Athens, Greece: IEEE, 2018.
    [12] Pinheiro P O, Lin T Y, Collobert R, Dollár P. Learning to Refine Object Segments. In: Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer, 2016. 75-91
    [13] Lin G S, Milan A, Shen C H, Reid I. RefineNet: Multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA: IEEE, 2017.
    [14] Liu W, Rabinovich A, Berg A C. ParseNet: Looking wider to see better. arXiv: 1506.04579, 2015.
    [15] Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 1520-1528
    [16] Dang H T H. A guide to receptive field arithmetic for convolutional neural networks[Online], available: https://medium.com/mlreview/a-guide-to-receptive-field-arithmetic-for-convolutional-neural-networks-e0f514068807, April 5, 2017
    [17] Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning (ICML). Lille, France: PMLR, 2015. 448-456
    [18] Chollet F. Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA: IEEE, 2017.
    [19] Sandler M, Howard A, Zhu M L, Zhmoginov A, Chen L C. Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. arXiv: 1801. 04381. 2018.
    [20] Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation. In: Proceedings of the 2015 Medical Image Computing and Computer-Assisted Intervention (MICCAI). Switzerland: Springer-verlag, 2015. 234-241
    [21] Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. arXiv: 1511.07122, 2015.
    [22] Hu J, Shen L, Albanie S, Sun G, Wu E H. Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, arXiv: 1709.01507, 2017.
    [23] 张婷, 李玉鑑, 胡海鹤, 张亚红. 基于跨连卷积神经网络的性别分类模型. 自动化学报, 2016, 42(6): 858-865 doi: 10.16383/j.aas.2016.c150658

    Zhang Ting, Li Yu-Jian, Hu Hai-He, Zhang Ya-Hong. A gender classification model based on cross-connected convolutional neural networks. Acta Automatica Sinica, 2016, 42(6): 858-865 doi: 10.16383/j.aas.2016.c150658
    [24] Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors. Nature, 1986, 323(6088): 533-536 doi: 10.1038/323533a0
    [25] Michael A N. Neural Networks and Deep Learning[Online], available: http://neuralnetworksanddeeplearning.com/, October 2, 2018
    [26] Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B. The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016. 3213-3223
    [27] Garcia-Garcia A, Orts-Escolano S, Oprea S, Villenamartinez V, Garciarodriguez J. A review on deep learning techniques applied to semantic segmentation. arXiv: 1704.06857. 2017.
    [28] Paszke A, Chaurasia A, Kim S, Culurciello E. ENet: A deep neural network architecture for real-time semantic segmentation. arXiv: 1606.02147. 2016.
    [29] Poudel R P K, Bonde U, Liwicki S, Zach C. ContextNet: Exploring context and detail for semantic segmentation in real-time. arXiv: 1805.04554. 2018.
    [30] Romera Eälvarez J M, Bergasa L M, Arroyo R. ERFNet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Transactions on Intelligent Transportation Systems, 2017, 19(1): 263-272 http://ieeexplore.ieee.org/document/8063438
  • 加载中
图(10) / 表(3)
计量
  • 文章访问数:  1772
  • HTML全文浏览量:  389
  • PDF下载量:  341
  • 被引次数: 0
出版历程
  • 收稿日期:  2018-10-01
  • 录用日期:  2018-12-25
  • 刊出日期:  2021-05-21

目录

    /

    返回文章
    返回