基于深度图及分离池化技术的场景复原及语义分类网络

林金花; 姚禹; 王莹

doi:10.16383/j.aas.2018.c170439

基于深度图及分离池化技术的场景复原及语义分类网络

doi: 10.16383/j.aas.2018.c170439

林金花^1,2, ,,
姚禹^1,,
王莹^1,

1.
长春工业大学应用技术学院长春 130012
2.
中国科学院长春光学精密机械与物理研究所长春 130031

基金项目:

国家自然科学基金 51705032

国家高技术研究发展计划（863计划） 2014AA7031010B

吉林省教育厅“十三五”科学技术研究项目 2016345

详细信息

作者简介:
姚禹博士, 长春工业大学讲师.主要研究方向为复杂机电系统建模、滤波与控制.E-mail:yaoyu@ccut.edu.cn

王莹博士, 长春工业大学讲师.主要研究方向为数字图像处理.E-mail:wangying@ccut.edu.cn

通讯作者:
林金花博士, 长春工业大学讲师.主要研究方向为数字图像处理, 目标识别与跟踪.本文通信作者.E-mail:linjinhua@ccut.edu.cn

计量
- 文章访问数: 2083
- HTML全文浏览量: 473
- PDF下载量: 127
- 被引次数: 0
出版历程
- 收稿日期: 2017-08-01
- 录用日期: 2017-12-14
- 刊出日期: 2019-11-20

Scene Restoration and Semantic Classification Network Using Depth Map and Discrete Pooling Technology

LIN Jin-Hua^{1,2
, ,},
YAO Yu^1
,,
WANG Ying^1
,

1.
School of Application Technology, Changchun University of Technology, Changchun 130012
2.
Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130031

Funds:

National Natural Science Foundation of China 51705032

National High Technology Research and Development Program of China (863 Program) 2014AA7031010B

Jilin Province "Thirteenth Five" Science and Technology Research Project 2016345

More Information

Author Bio:
Ph. D., lecturer at Chang- chun University of Technology. Her research interest covers complex electromechanical system modeling, filtering and control

Ph. D., lecturer at Changchun University of Technology. Her main research interest is digital image processing

Corresponding author: LIN Jin-Hua Ph. D., lecturer at Changchun University of Technology. Her research interest covers digital image processing, target recognition, and tracking. Corresponding author of this paper

摘要

摘要: 在机器视觉感知系统中，从不完整的被遮挡的目标对象中鲁棒重建三维场景及其语义信息至关重要.目前常用方法一般将这两个功能分开处理，本文将二者结合，提出了一种基于深度图及分离池化技术的场景复原及语义分类网络，依据深度图中的RGB-D信息，完成对三维目标场景的重建与分类.首先，构建了一种CPU端到GPU端的深度卷积神经网络模型，将从传感器采样的深度图像作为输入，深度学习摄像机投影区域内的上下文目标场景信息，网络的输出为使用改进的截断式带符号距离函数（Truncated signed distance function，TSDF）编码后的体素级语义标注.然后，使用分离池化技术改进卷积神经网络的池化层粒度结构，设计带细粒度池化的语义分类损失函数，用于回馈网络的语义分类重定位.最后，为增强卷积神经网络的深度学习能力，构建了一种带有语义标注的三维目标场景数据集，以此加强本文所提网络的深度学习鲁棒性.实验结果表明，与目前较先进的网络模型对比，本文网络的重建规模扩大了2.1%，所提深度卷积网络对缺失场景的复原效果较好，同时保证了语义分类的精准度.
- 机器视觉感知系统 /
- 池化技术 /
- 深度图 /
- 深度学习 /
- 卷积神经网络
Abstract: In the machine vision perception system, it is very important to robustly reconstruct the 3D scene and recognize target semantics. At present, commonly used methods generally deal with these two functions separately. In this paper, we propose a scene restoration and semantic classification network using the depth map. Based on the RGB-D information in the depth map, reconstruction of a 3D target scene is completed along with classification. Firstly, a deep convolutional neural network model from the CPU end to the GPU end is constructed, which takes depth samples as input from sensor and deeply learns contextual target scene information in the camera projection area. The output of the network comes from the improved truncated signed distance function (TSDF) coding voxel-level semantic annotation. Secondly, in order to enhance the deep learning ability of the convolutional neural network, a three-dimensional target scene dataset with semantic annotation is constructed to enhance the robustness of the proposed network. Experimental results show that compared with the current advanced network model, the reconstruction scale of this network model expands by 2.1%. The proposed convolutional network has good reconstruction effect on the missing scene and the accuracy of semantic classification is also guaranteed.
- Machine vision perception system /
- pooling technology /
- depth map /
- deep learning /
- convolutional neural network
注释:

HTML全文

图 1 本文深度卷积神经网络的场景重建与语义分类过程

Fig. 1 3D reconstruction and semantic classification of our depth convolutional neural network

下载: 全尺寸图片幻灯片

图 2 常用的TSDF编码可视化结果

Fig. 2 Visualization of several encoding TSDF

下载: 全尺寸图片幻灯片

图 3 本文所提深度卷积神经网络模型

Fig. 3 Our depth convolutional neural network

下载: 全尺寸图片幻灯片

图 4 本文语义分类的卷积流程

Fig. 4 Convolutional streamline of our semantic classification

下载: 全尺寸图片幻灯片

图 5 本文摄像头接收范围直接影响网络性能

Fig. 5 Our camera receiving range directly affects performance of network

下载: 全尺寸图片幻灯片

图 6 带有二进制权值和量化激励的网络层点积分布图. (a), (b), (c), (d)分别为下采样层1、卷积层3、下采样层6、卷积层7的点积分布图(具有不同的均值和标准偏差); (e), (f), (g), (h)分别为下采样层1、卷积层3、下采样层6、卷积层7对应的点积误差分布曲线

Fig. 6 Dot product distribution of network with binary weights and quantitative activation. (a), (b), (c) and (d) are the point product distribution maps of the pooling layer 1, the convolution layer 3, the pooling layer 6 and the convolution layer 7, respectively, they share a different mean and standard deviation; (e), (f), (g) and (h) are the dot product error distribution curves corresponding to the pooling layer 1, the convolution layer 3, the pooling layer 6 and the convolution layer 7, respectively.

下载: 全尺寸图片幻灯片

图 7 几种复原网络的可视化性能对比图

Fig. 7 Visualization performance comparison for several completion neural networks

下载: 全尺寸图片幻灯片

图 8 本文网络预测出的周围对象

Fig. 8 Prediction of surrounding object by our network

下载: 全尺寸图片幻灯片

图 9 改进的TSDF编码对语义场景复原性能的影响

Fig. 9 Effect of improved TSDF on semantic scene completion

下载: 全尺寸图片幻灯片

表 1 本文网络与L、GW网络的复原与分类性能比较(%)

Table 1 Comparison of three networks for performance of reconstruction and semantic classification (%)

		L	GW	本文NYU	本文LS_3DDS	本文NYU$+$LS_ 3DDS
复原	闭环率	59.6	66.8	57.0	55.6	69.3
	IoU	37.8	46.4	59.1	58.2	58.6
语义场景复原	天花板	0	14.2	17.1	8.8	19.1
	地面	15.7	65.5	92.7	85.8	94.6
	墙壁	16.7	17.1	28.4	15.6	29.7
	窗	15.6	8.7	0	7.4	18.8
	椅子	9.4	4.5	15.6	18.9	19.3
	床	27.3	46.6	37.1	37.4	53.6
	沙发	22.9	25.7	38.0	28.0	47.9
	桌子	7.2	9.3	18.0	18.7	19.9
	显示器	7.6	7.0	9.8	7.1	12.9
	家具	15.6	27.7	28.1	10.4	30.1
	物品	2.1	8.3	15.1	6.4	11.6
	平均值	18.3	26.8	32.0	27.6	37.3

下载: 导出CSV

表 2 本文网与F网、Z网的重建性能对比数据(%)

Table 2 Comparison of our network reconstruction performance with F and Z networks (%)

	训练数据集	复原准确率	闭环率	IoU值
F复原方法	NYU	66.5	69.7	50.8
Z复原方法	NYU	60.1	46.7	34.6
本文复原	NYU	66.3	96.9	64.8
文语义复原	NYU	75.0	92.3	70.3
文语义复原	LS_3DDS	75.0	96.0	73.0

下载: 导出CSV

参考文献(26)

[1]	Gupta S, Arbeláez P, Malik J. Perceptual organization and recognition of indoor scenes from RGB-D images. In: Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Portland, OR, USA: IEEE, 2013. 564-571 http://www.researchgate.net/publication/261227425_Perceptual_Organization_and_Recognition_of_Indoor_Scenes_from_RGB-D_Images
[2]	Ren X F, Bo L F, Fox D. RGB-(D) scene labeling: features and algorithms. In: Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Providence, RI, USA: IEEE, 2012. 2759-2766
[3]	Silberman N, Hoiem D, Kohli P, Fergus R. Indoor segmentation and support inference from RGBD images. In: Proceedings of the 12th European Conference on Computer Vision. Florence, Italy: Springer, 2012. 746-760 doi: 10.1007/978-3-642-33715-4_54
[4]	Lai K, Bo L F, Fox D. Unsupervised feature learning for 3D scene labeling. In: Proceedings of 2014 IEEE International Conference on Robotics and Automation (ICRA). Hong Kong, China: IEEE, 2014. 3050-3057 http://www.researchgate.net/publication/286679738_Unsupervised_feature_learning_for_3D_scene_labeling
[5]	Rock J, Gupta T, Thorsen J, Gwak J Y, Shin D, Hoiem D. Completing 3D object shape from one depth image. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA: IEEE, 2015. 2484-2493.
[6]	Shah S A A, Bennamoun M, Boussaid F. Keypoints-based surface representation for 3D modeling and 3D object recognition. Pattern Recognition, 2017, 64:29-38 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=0a0d4dd53a9021a3b08eb00743de46f0
[7]	Ren C Y, Prisacariu V A, Kähler O, Reid I D, Murray D W. Real-time tracking of single and multiple objects from depth-colour imagery using 3D signed distance functions. International Journal of Computer Vision, 2017, 124(1):80-95 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=3e57b7d19aee23e14c99b6a73045ae38
[8]	Gupta S, Arbeláez P, Girshick R, Malik J. Aligning 3D models to RGB-D images of cluttered scenes. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, Massachusetts, USA: IEEE, 2015. 4731-4740
[9]	Song S R, Xiao J X. Sliding shapes for 3D object detection in depth images. In: Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014. 634-651
[10]	Li X, Fang M, Zhang J J, Wu J Q. Learning coupled classifiers with RGB images for RGB-D object recognition. Pattern Recognition, 2017, 61:433-446 doi: 10.1016/j.patcog.2016.08.016
[11]	Nan L L, Xie K, Sharf A. A search-classify approach for cluttered indoor scene understanding. ACM Transactions on Graphics (TOG), 2012, 31(6): Article No. 137
[12]	Lin D H, Fidler S, Urtasun R. Holistic scene understanding for 3D object detection with RGBD cameras. In: Proceedings of 2013 IEEE International Conference on Computer Vision (ICCV). Sydney, NSW, Australia: IEEE, 2013. 1417-1424
[13]	Ohn-Bar E, Trivedi M M. Multi-scale volumes for deep object detection and localization. Pattern Recognition, 2017, 61:557-572 doi: 10.1016/j.patcog.2016.06.002
[14]	Zheng B, Zhao Y B, Yu J C, Ikeuchi K, Zhu S C. Beyond point clouds: scene understanding by reasoning geometry and physics. In: Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE, 2013. 3127-3134 http://www.researchgate.net/publication/261263632_Beyond_Point_Clouds_Scene_Understanding_by_Reasoning_Geometry_and_Physics
[15]	Kim B S, Kohli P, Savarese S. 3D scene understanding by voxel-CRF. In: Proceedings of 2013 IEEE International Conference on Computer Vision (ICCV). Sydney, NSW, Australia: IEEE, 2013. 1425-1432
[16]	Häne C, Zach C, Cohen A, Angst R. Joint 3D scene reconstruction and class segmentation. In: Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Portland, OR, USA: IEEE, 2013. 97-104 http://www.researchgate.net/publication/261448707_Joint_3D_Scene_Reconstruction_and_Class_Segmentation
[17]	Bláha M, Vogel C, Richard A, Wegner J D, Pock T, Schindler K. Large-scale semantic 3D reconstruction: an adaptive multi-resolution model for multi-class volumetric labeling. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE, 2016. 3176-3184
[18]	Handa A, Patraucean V, Badrinarayanan V, Stent S, Cipolla R. Understanding real world indoor scenes with synthetic data. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE, 2016. 4077-4085
[19]	吕朝辉, 沈萦华, 李精华.基于Kinect的深度图像修复方法.吉林大学学报(工学版), 2016, 46(5):1697-1703 http://d.old.wanfangdata.com.cn/Periodical/jlgydxzrkxxb201605046 Lv Chao-Hui, Shen Ying-Hua, Li Jing-Hua. Depth map inpainting method based on Kinect sensor. Journal of Jilin University (Engineering and Technology Edition), 2016, 46(5):1697-1703 http://d.old.wanfangdata.com.cn/Periodical/jlgydxzrkxxb201605046
[20]	胡长胜, 詹曙, 吴从中.基于深度特征学习的图像超分辨率重建.自动化学报, 2017, 43(5):814-821 http://www.aas.net.cn/CN/abstract/abstract19059.shtml Hu Chang-Sheng, Zhan Shu, Wu Cong-Zhong. Image super-resolution based on deep learning features. Acta Automatica Sinica, 2017, 43(5):814-821 http://www.aas.net.cn/CN/abstract/abstract19059.shtml
[21]	Wang P S, Liu Y, Guo Y X, Sun C Y, Tong X. O-CNN: octree-based convolutional neural networks for 3D shape analysis. ACM Transactions on Graphics (TOG), 2017, 36(4): Article No. 72
[22]	Yücer K, Sorkine-Hornung A, Wang O, Sorkine-Hornung O. Efficient 3D object segmentation from densely sampled light fields with applications to 3D reconstruction. ACM Transactions on Graphics (TOG), 2016, 35(3): Article No. 22 http://www.researchgate.net/publication/298910150_Efficient_3D_Object_Segmentation_from_Densely_Sampled_Light_Fields_with_Applications_to_3D_Reconstruction
[23]	Hyvärinen A, Oja E. Independent component analysis:algorithms and applications. Neural Networks, 2000, 13(4-5):411-430 doi: 10.1016/S0893-6080(00)00026-5
[24]	Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA: IEEE, 2015. 3431-3440
[25]	Guo R Q, Zou C H, Hoiem D. Predicting complete 3D models of indoor scenes. arXiv: 1504.02437, 2015.
[26]	孙旭, 李晓光, 李嘉锋, 卓力.基于深度学习的图像超分辨率复原研究进展.自动化学报, 2017, 43(5):697-709 http://www.aas.net.cn/CN/abstract/abstract19048.shtml Sun Xu, Li Xiao-Guang, Li Jia-Feng, Zhuo Li. Review on deep learning based image super-resolution restoration algorithms. Acta Automatica Sinica, 2017, 43(5):697-709 http://www.aas.net.cn/CN/abstract/abstract19048.shtml

施引文献

资源附件(0)

访问统计

图(9) / 表(2)

计量

文章访问数: 2083
HTML全文浏览量: 473
PDF下载量: 127
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

基于深度图及分离池化技术的场景复原及语义分类网络

doi: 10.16383/j.aas.2018.c170439

作者简介:
姚禹博士, 长春工业大学讲师.主要研究方向为复杂机电系统建模、滤波与控制.E-mail:yaoyu@ccut.edu.cn

王莹博士, 长春工业大学讲师.主要研究方向为数字图像处理.E-mail:wangying@ccut.edu.cn

通讯作者:
林金花博士, 长春工业大学讲师.主要研究方向为数字图像处理, 目标识别与跟踪.本文通信作者.E-mail:linjinhua@ccut.edu.cn

计量

Scene Restoration and Semantic Classification Network Using Depth Map and Discrete Pooling Technology

Author Bio:
Ph. D., lecturer at Chang- chun University of Technology. Her research interest covers complex electromechanical system modeling, filtering and control

Ph. D., lecturer at Changchun University of Technology. Her main research interest is digital image processing

Corresponding author: LIN Jin-Hua Ph. D., lecturer at Changchun University of Technology. Her research interest covers digital image processing, target recognition, and tracking. Corresponding author of this paper

计量

目录

留言板

基于深度图及分离池化技术的场景复原及语义分类网络

doi: 10.16383/j.aas.2018.c170439

作者简介: 姚禹 博士, 长春工业大学讲师.主要研究方向为复杂机电系统建模、滤波与控制.E-mail:yaoyu@ccut.edu.cn 王莹 博士, 长春工业大学讲师.主要研究方向为数字图像处理.E-mail:wangying@ccut.edu.cn

通讯作者: 林金花 博士, 长春工业大学讲师.主要研究方向为数字图像处理, 目标识别与跟踪.本文通信作者.E-mail:linjinhua@ccut.edu.cn

计量

出版历程

Scene Restoration and Semantic Classification Network Using Depth Map and Discrete Pooling Technology

Author Bio: Ph. D., lecturer at Chang- chun University of Technology. Her research interest covers complex electromechanical system modeling, filtering and control Ph. D., lecturer at Changchun University of Technology. Her main research interest is digital image processing

Corresponding author: LIN Jin-Hua Ph. D., lecturer at Changchun University of Technology. Her research interest covers digital image processing, target recognition, and tracking. Corresponding author of this paper

计量

出版历程

目录

作者简介:
姚禹博士, 长春工业大学讲师.主要研究方向为复杂机电系统建模、滤波与控制.E-mail:yaoyu@ccut.edu.cn

王莹博士, 长春工业大学讲师.主要研究方向为数字图像处理.E-mail:wangying@ccut.edu.cn

通讯作者:
林金花博士, 长春工业大学讲师.主要研究方向为数字图像处理, 目标识别与跟踪.本文通信作者.E-mail:linjinhua@ccut.edu.cn

Author Bio:
Ph. D., lecturer at Chang- chun University of Technology. Her research interest covers complex electromechanical system modeling, filtering and control

Ph. D., lecturer at Changchun University of Technology. Her main research interest is digital image processing