2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于深度图及分离池化技术的场景复原及语义分类网络

林金花 姚禹 王莹

林金花, 姚禹, 王莹. 基于深度图及分离池化技术的场景复原及语义分类网络. 自动化学报, 2019, 45(11): 2178-2186. doi: 10.16383/j.aas.2018.c170439
引用本文: 林金花, 姚禹, 王莹. 基于深度图及分离池化技术的场景复原及语义分类网络. 自动化学报, 2019, 45(11): 2178-2186. doi: 10.16383/j.aas.2018.c170439
LIN Jin-Hua, YAO Yu, WANG Ying. Scene Restoration and Semantic Classification Network Using Depth Map and Discrete Pooling Technology. ACTA AUTOMATICA SINICA, 2019, 45(11): 2178-2186. doi: 10.16383/j.aas.2018.c170439
Citation: LIN Jin-Hua, YAO Yu, WANG Ying. Scene Restoration and Semantic Classification Network Using Depth Map and Discrete Pooling Technology. ACTA AUTOMATICA SINICA, 2019, 45(11): 2178-2186. doi: 10.16383/j.aas.2018.c170439

基于深度图及分离池化技术的场景复原及语义分类网络

doi: 10.16383/j.aas.2018.c170439
基金项目: 

国家自然科学基金 51705032

国家高技术研究发展计划(863计划) 2014AA7031010B

吉林省教育厅“十三五”科学技术研究项目 2016345

详细信息
    作者简介:

    姚禹  博士, 长春工业大学讲师.主要研究方向为复杂机电系统建模、滤波与控制.E-mail:yaoyu@ccut.edu.cn

    王莹   博士, 长春工业大学讲师.主要研究方向为数字图像处理.E-mail:wangying@ccut.edu.cn

    通讯作者:

    林金花  博士, 长春工业大学讲师.主要研究方向为数字图像处理, 目标识别与跟踪.本文通信作者.E-mail:linjinhua@ccut.edu.cn

Scene Restoration and Semantic Classification Network Using Depth Map and Discrete Pooling Technology

Funds: 

National Natural Science Foundation of China 51705032

National High Technology Research and Development Program of China (863 Program) 2014AA7031010B

Jilin Province "Thirteenth Five" Science and Technology Research Project 2016345

More Information
    Author Bio:

      Ph. D., lecturer at Chang- chun University of Technology. Her research interest covers complex electromechanical system modeling, filtering and control

      Ph. D., lecturer at Changchun University of Technology. Her main research interest is digital image processing

    Corresponding author: LIN Jin-Hua   Ph. D., lecturer at Changchun University of Technology. Her research interest covers digital image processing, target recognition, and tracking. Corresponding author of this paper
  • 摘要: 在机器视觉感知系统中,从不完整的被遮挡的目标对象中鲁棒重建三维场景及其语义信息至关重要.目前常用方法一般将这两个功能分开处理,本文将二者结合,提出了一种基于深度图及分离池化技术的场景复原及语义分类网络,依据深度图中的RGB-D信息,完成对三维目标场景的重建与分类.首先,构建了一种CPU端到GPU端的深度卷积神经网络模型,将从传感器采样的深度图像作为输入,深度学习摄像机投影区域内的上下文目标场景信息,网络的输出为使用改进的截断式带符号距离函数(Truncated signed distance function,TSDF)编码后的体素级语义标注.然后,使用分离池化技术改进卷积神经网络的池化层粒度结构,设计带细粒度池化的语义分类损失函数,用于回馈网络的语义分类重定位.最后,为增强卷积神经网络的深度学习能力,构建了一种带有语义标注的三维目标场景数据集,以此加强本文所提网络的深度学习鲁棒性.实验结果表明,与目前较先进的网络模型对比,本文网络的重建规模扩大了2.1%,所提深度卷积网络对缺失场景的复原效果较好,同时保证了语义分类的精准度.
  • 图  1  本文深度卷积神经网络的场景重建与语义分类过程

    Fig.  1  3D reconstruction and semantic classification of our depth convolutional neural network

    图  2  常用的TSDF编码可视化结果

    Fig.  2  Visualization of several encoding TSDF

    图  3  本文所提深度卷积神经网络模型

    Fig.  3  Our depth convolutional neural network

    图  4  本文语义分类的卷积流程

    Fig.  4  Convolutional streamline of our semantic classification

    图  5  本文摄像头接收范围直接影响网络性能

    Fig.  5  Our camera receiving range directly affects performance of network

    图  6  带有二进制权值和量化激励的网络层点积分布图. (a), (b), (c), (d)分别为下采样层1、卷积层3、下采样层6、卷积层7的点积分布图(具有不同的均值和标准偏差); (e), (f), (g), (h)分别为下采样层1、卷积层3、下采样层6、卷积层7对应的点积误差分布曲线

    Fig.  6  Dot product distribution of network with binary weights and quantitative activation. (a), (b), (c) and (d) are the point product distribution maps of the pooling layer 1, the convolution layer 3, the pooling layer 6 and the convolution layer 7, respectively, they share a different mean and standard deviation; (e), (f), (g) and (h) are the dot product error distribution curves corresponding to the pooling layer 1, the convolution layer 3, the pooling layer 6 and the convolution layer 7, respectively.

    图  7  几种复原网络的可视化性能对比图

    Fig.  7  Visualization performance comparison for several completion neural networks

    图  8  本文网络预测出的周围对象

    Fig.  8  Prediction of surrounding object by our network

    图  9  改进的TSDF编码对语义场景复原性能的影响

    Fig.  9  Effect of improved TSDF on semantic scene completion

    表  1  本文网络与L、GW网络的复原与分类性能比较(%)

    Table  1  Comparison of three networks for performance of reconstruction and semantic classification (%)

    L GW 本文NYU 本文LS_3DDS 本文NYU$+$LS_ 3DDS
    复原 闭环率 59.6 66.8 57.0 55.6 69.3
    IoU 37.8 46.4 59.1 58.2 58.6
    语义场景复原 天花板 0 14.2 17.1 8.8 19.1
    地面 15.7 65.5 92.7 85.8 94.6
    墙壁 16.7 17.1 28.4 15.6 29.7
    15.6 8.7 0 7.4 18.8
    椅子 9.4 4.5 15.6 18.9 19.3
    27.3 46.6 37.1 37.4 53.6
    沙发 22.9 25.7 38.0 28.0 47.9
    桌子 7.2 9.3 18.0 18.7 19.9
    显示器 7.6 7.0 9.8 7.1 12.9
    家具 15.6 27.7 28.1 10.4 30.1
    物品 2.1 8.3 15.1 6.4 11.6
    平均值 18.3 26.8 32.0 27.6 37.3
    下载: 导出CSV

    表  2  本文网与F网、Z网的重建性能对比数据(%)

    Table  2  Comparison of our network reconstruction performance with F and Z networks (%)

    训练数据集 复原准确率 闭环率 IoU值
    F复原方法 NYU 66.5 69.7 50.8
    Z复原方法 NYU 60.1 46.7 34.6
    本文复原 NYU 66.3 96.9 64.8
    文语义复原 NYU 75.0 92.3 70.3
    LS_3DDS 75.0 96.0 73.0
    下载: 导出CSV
  • [1] Gupta S, Arbeláez P, Malik J. Perceptual organization and recognition of indoor scenes from RGB-D images. In: Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Portland, OR, USA: IEEE, 2013. 564-571 http://www.researchgate.net/publication/261227425_Perceptual_Organization_and_Recognition_of_Indoor_Scenes_from_RGB-D_Images
    [2] Ren X F, Bo L F, Fox D. RGB-(D) scene labeling: features and algorithms. In: Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Providence, RI, USA: IEEE, 2012. 2759-2766
    [3] Silberman N, Hoiem D, Kohli P, Fergus R. Indoor segmentation and support inference from RGBD images. In: Proceedings of the 12th European Conference on Computer Vision. Florence, Italy: Springer, 2012. 746-760 doi: 10.1007/978-3-642-33715-4_54
    [4] Lai K, Bo L F, Fox D. Unsupervised feature learning for 3D scene labeling. In: Proceedings of 2014 IEEE International Conference on Robotics and Automation (ICRA). Hong Kong, China: IEEE, 2014. 3050-3057 http://www.researchgate.net/publication/286679738_Unsupervised_feature_learning_for_3D_scene_labeling
    [5] Rock J, Gupta T, Thorsen J, Gwak J Y, Shin D, Hoiem D. Completing 3D object shape from one depth image. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA: IEEE, 2015. 2484-2493.
    [6] Shah S A A, Bennamoun M, Boussaid F. Keypoints-based surface representation for 3D modeling and 3D object recognition. Pattern Recognition, 2017, 64:29-38 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=0a0d4dd53a9021a3b08eb00743de46f0
    [7] Ren C Y, Prisacariu V A, Kähler O, Reid I D, Murray D W. Real-time tracking of single and multiple objects from depth-colour imagery using 3D signed distance functions. International Journal of Computer Vision, 2017, 124(1):80-95 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=3e57b7d19aee23e14c99b6a73045ae38
    [8] Gupta S, Arbeláez P, Girshick R, Malik J. Aligning 3D models to RGB-D images of cluttered scenes. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, Massachusetts, USA: IEEE, 2015. 4731-4740
    [9] Song S R, Xiao J X. Sliding shapes for 3D object detection in depth images. In: Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014. 634-651
    [10] Li X, Fang M, Zhang J J, Wu J Q. Learning coupled classifiers with RGB images for RGB-D object recognition. Pattern Recognition, 2017, 61:433-446 doi: 10.1016/j.patcog.2016.08.016
    [11] Nan L L, Xie K, Sharf A. A search-classify approach for cluttered indoor scene understanding. ACM Transactions on Graphics (TOG), 2012, 31(6): Article No. 137
    [12] Lin D H, Fidler S, Urtasun R. Holistic scene understanding for 3D object detection with RGBD cameras. In: Proceedings of 2013 IEEE International Conference on Computer Vision (ICCV). Sydney, NSW, Australia: IEEE, 2013. 1417-1424
    [13] Ohn-Bar E, Trivedi M M. Multi-scale volumes for deep object detection and localization. Pattern Recognition, 2017, 61:557-572 doi: 10.1016/j.patcog.2016.06.002
    [14] Zheng B, Zhao Y B, Yu J C, Ikeuchi K, Zhu S C. Beyond point clouds: scene understanding by reasoning geometry and physics. In: Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE, 2013. 3127-3134 http://www.researchgate.net/publication/261263632_Beyond_Point_Clouds_Scene_Understanding_by_Reasoning_Geometry_and_Physics
    [15] Kim B S, Kohli P, Savarese S. 3D scene understanding by voxel-CRF. In: Proceedings of 2013 IEEE International Conference on Computer Vision (ICCV). Sydney, NSW, Australia: IEEE, 2013. 1425-1432
    [16] Häne C, Zach C, Cohen A, Angst R. Joint 3D scene reconstruction and class segmentation. In: Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Portland, OR, USA: IEEE, 2013. 97-104 http://www.researchgate.net/publication/261448707_Joint_3D_Scene_Reconstruction_and_Class_Segmentation
    [17] Bláha M, Vogel C, Richard A, Wegner J D, Pock T, Schindler K. Large-scale semantic 3D reconstruction: an adaptive multi-resolution model for multi-class volumetric labeling. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE, 2016. 3176-3184
    [18] Handa A, Patraucean V, Badrinarayanan V, Stent S, Cipolla R. Understanding real world indoor scenes with synthetic data. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE, 2016. 4077-4085
    [19] 吕朝辉, 沈萦华, 李精华.基于Kinect的深度图像修复方法.吉林大学学报(工学版), 2016, 46(5):1697-1703 http://d.old.wanfangdata.com.cn/Periodical/jlgydxzrkxxb201605046

    Lv Chao-Hui, Shen Ying-Hua, Li Jing-Hua. Depth map inpainting method based on Kinect sensor. Journal of Jilin University (Engineering and Technology Edition), 2016, 46(5):1697-1703 http://d.old.wanfangdata.com.cn/Periodical/jlgydxzrkxxb201605046
    [20] 胡长胜, 詹曙, 吴从中.基于深度特征学习的图像超分辨率重建.自动化学报, 2017, 43(5):814-821 http://www.aas.net.cn/CN/abstract/abstract19059.shtml

    Hu Chang-Sheng, Zhan Shu, Wu Cong-Zhong. Image super-resolution based on deep learning features. Acta Automatica Sinica, 2017, 43(5):814-821 http://www.aas.net.cn/CN/abstract/abstract19059.shtml
    [21] Wang P S, Liu Y, Guo Y X, Sun C Y, Tong X. O-CNN: octree-based convolutional neural networks for 3D shape analysis. ACM Transactions on Graphics (TOG), 2017, 36(4): Article No. 72
    [22] Yücer K, Sorkine-Hornung A, Wang O, Sorkine-Hornung O. Efficient 3D object segmentation from densely sampled light fields with applications to 3D reconstruction. ACM Transactions on Graphics (TOG), 2016, 35(3): Article No. 22 http://www.researchgate.net/publication/298910150_Efficient_3D_Object_Segmentation_from_Densely_Sampled_Light_Fields_with_Applications_to_3D_Reconstruction
    [23] Hyvärinen A, Oja E. Independent component analysis:algorithms and applications. Neural Networks, 2000, 13(4-5):411-430 doi: 10.1016/S0893-6080(00)00026-5
    [24] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA: IEEE, 2015. 3431-3440
    [25] Guo R Q, Zou C H, Hoiem D. Predicting complete 3D models of indoor scenes. arXiv: 1504.02437, 2015.
    [26] 孙旭, 李晓光, 李嘉锋, 卓力.基于深度学习的图像超分辨率复原研究进展.自动化学报, 2017, 43(5):697-709 http://www.aas.net.cn/CN/abstract/abstract19048.shtml

    Sun Xu, Li Xiao-Guang, Li Jia-Feng, Zhuo Li. Review on deep learning based image super-resolution restoration algorithms. Acta Automatica Sinica, 2017, 43(5):697-709 http://www.aas.net.cn/CN/abstract/abstract19048.shtml
  • 加载中
图(9) / 表(2)
计量
  • 文章访问数:  1852
  • HTML全文浏览量:  375
  • PDF下载量:  109
  • 被引次数: 0
出版历程
  • 收稿日期:  2017-08-01
  • 录用日期:  2017-12-14
  • 刊出日期:  2019-11-20

目录

    /

    返回文章
    返回