一种面向散乱点云语义分割的深度残差<b>−</b>特征金字塔网络框架

彭秀平; 仝其胜; 林洪彬; 冯超; 郑武

doi:10.16383/j.aas.c190063

一种面向散乱点云语义分割的深度残差−特征金字塔网络框架

doi: 10.16383/j.aas.c190063

彭秀平^1,,
仝其胜^1,,
林洪彬^2,,
冯超^1,,
郑武^1,

1.
燕山大学信息科学与工程学院秦皇岛 066004
2.
燕山大学电气工程学院秦皇岛 066004

基金项目: 国家重点研发计划(2017YFB0306402), 国家自然科学基金(51305390, 61601401), 河北省自然科学基金(F2016203312, E2020303188), 河北省高等学校青年拔尖人才计划项目(BJ2018018), 河北省教育厅高等学校科技计划重点项目(ZD2019039)资助

详细信息

作者简介:
彭秀平：燕山大学信息科学与工程学院副教授. 主要研究方向为扩频序列设计, 组合设计编码, 智能信息处理. E-mail: pengxp@ysu.edu.cn

仝其胜：燕山大学信息科学与工程学院硕士研究生. 主要研究方向为计算机视觉, 点云感知. E-mail: tsisen@outlook.com

林洪彬：燕山大学电气工程学院副教授. 主要研究方向为点云处理, 模式识别与计算机视觉. 本文通信作者. E-mail: honphin@ysu.edu.cn

冯超：燕山大学信息科学与工程学院硕士研究生. 主要研究方向为计算机视觉, 同步定位与建图. E-mail: chaofenggo@163.com

郑武：燕山大学信息科学与工程学院硕士研究生. 主要研究方向为计算机视觉, 点云感知. E-mail: zweducn@163.com

计量
- 文章访问数: 765
- HTML全文浏览量: 511
- PDF下载量: 198
- 被引次数: 0
出版历程
- 收稿日期: 2019-01-26
- 录用日期: 2019-07-30
- 刊出日期: 2021-12-23

A Deep Residual − Feature Pyramid Network Framework for Scattered Point Cloud Semantic Segmentation

1.
School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004
2.
School of Electrical Engineering, Yanshan University, Qinhuangdao 066004

Funds: Supported by National Key Research and Development Program of China (2017YFB0306402), National Natural Science Foundation of China (51305390, 61601401), Natural Science Foundation of Hebei Province (F2016203312, E2020303188), Young Talent Program of Colleges in Hebei Province (BJ2018018), and Key Foundation of Hebei Educational Committee (ZD2019039)

More Information

Author Bio:
PENG Xiu-Ping　Associate professor at the School of Information Science and Engineering, Yanshan University. Her research interest covers spread spectrum sequence design, combination design coding, and intelligent information processing

TONG Qi-Sheng　Master student at the School of Information Science and Engineering, Yanshan University. His research interest covers computer vision and point cloud perception

LIN Hong-Bin　Associate professor at the School of Electrical Engineering, Yanshan University. His research interest covers point cloud processing, pattern recognition, and computer vision. Corresponding author of this paper

FENG Chao　Master student at the School of InformationScience and Engineering, Yanshan University. His research interest covers computer vision and simultaneous localization and mapping (SLAM)

ZHENG Wu　Master student at the School of Information Science and Engineering, Yanshan University. His research interest covers computer vision and point cloud perception

摘要

摘要: 针对当前基于深度学习的散乱点云语义特征提取方法通用性差以及特征提取不足导致的分割精度和可靠性差的难题, 提出了一种散乱点云语义分割深度残差−特征金字塔网络框架. 首先, 针对当前残差网络在卷积方式上的局限性, 定义一种立方体卷积运算, 不仅可以通过二维卷积运算实现三维表示点的高层特征的抽取, 还可以解决现有的参数化卷积设计通用性差的问题;其次, 将定义的立方体卷积计算与残差网络相结合, 构建面向散乱点云语义分割的深度残差特征学习网络框架; 进一步, 将深度残差网络与特征金字塔网络相结合, 实现三维表示点高层特征多尺度学习与散乱点云场景语义分割. 实验结果表明, 本文提出的立方体卷积运算具有良好的适用性, 且本文提出的深度残差−特征金字塔网络框架在分割精度方面优于现存同类方法.
- 散乱点云 /
- 语义分割 /
- 立方体卷积 /
- 残差网络 /
- 特征金字塔网络
Abstract: Aiming at the problem that the current scattered point cloud semantic segmentation based on the deep learning method is poor in generality and the problem of poor segmentation accuracy and reliability caused by insufficient feature extraction, a scattered point cloud depth residual-feature pyramid network framework is proposed. Firstly, for the limitation of the current residual network in the convolution mode, a cube convolution operation is defined, which can not only extract the high-level features of the three-dimensional representation point through the two-dimensional convolution operation, but also solve the problem of poor generality of current parameterized convolution design. Secondly, a deep residual feature learning network framework is constructed for scattered point cloud semantic segmentation through integrating the defined cube convolution operation into the residual network. Additionally, the proposed deep residual network is combined with the feature pyramid network to enable multi-scale learning of high-level features of three-dimensional representation points and semantic segmentation of scattered point cloud. The experimental results show that the cube convolution proposed in this paper has good applicability and the depth residual-feature pyramid network framework is superior to the existing similar methods in terms of segmentation accuracy.
- Scattered point cloud /
- semantic segmentation /
- cube convolution /
- residual network /
- feature pyramid network
注释:

1) 收稿日期 2019-01-26 录用日期 2019-07-30 Manuscript received January 26, 2019; accepted July 30, 2019 国家重点研发计划(2017YFB0306402), 国家自然科学基金 (51305390, 61601401), 河北省自然科学基金(F2016203312, E2020303188), 河北省高等学校青年拔尖人才计划项目(BJ2018018), 河北省教育厅高等学校科技计划重点项目(ZD2019039) 资助 Supported by National Key Research and Development Program of China (2017YFB0306402), National Natural Science Foundation of China (51305390, 61601401), Natural Science Foundation of Hebei Province (F2016203312, E2020303188), Young Talent Program of Colleges inHebei Province (BJ2018018), and KeyFoundation of Hebei Educational Committee (ZD2019039)

2) 本文责任编委黄庆明 Recommended by Associate Editor HUANG Qing-Ming 1. 燕山大学信息科学与工程学院秦皇岛 066004 2. 燕山大学电气工程学院秦皇岛 066004 1. School of Information Science and Engineering, YanshanUniversity, Qinhuangdao 066004 2. School of Electrical Engin-eering, Yanshan University, Qinhuangdao 066004

HTML全文

图 1 深度残差−特征金字塔网络框架

Fig. 1 Depth residual − feature pyramid network framework

下载: 全尺寸图片幻灯片

图 2 二维卷积

Fig. 2 The 2D convolution

下载: 全尺寸图片幻灯片

图 3 立方体卷积

Fig. 3 The cube convolution

下载: 全尺寸图片幻灯片

图 4 三维点云特征残差学习结构

Fig. 4 The residual learning structure for 3D point cloud feature

下载: 全尺寸图片幻灯片

图 5 三维点云特征金字塔网络

Fig. 5 The feature pyramid network for 3D point cloud

下载: 全尺寸图片幻灯片

表 1 参数设计

Table 1 The parameter design

层	卷积核大小	立方体边长 (m)	输出点个数	输出特征通道数
立方体卷积	$1\times27,64$	0.1	8 192	64
立方体最大池化	$1\times27,64$	0.1	2 048	64
残差块1	$\left[\begin{aligned} 1\times1,64\;\\ 1\times27,64\\ 1\times1,256 \end{aligned}\right]\times3$	0.2	2 048	256
残差块2	$\left[\begin{aligned} 1\times1,12\;8\\ 1\times27,128\\ 1\times1,512\; \end{aligned}\right]\times4$	0.4	512	512
残差块3	$\left[ \begin{aligned}1\times1,256\;\\ 1\times27,256\\ 1\times1,1024\end{aligned}\right]\times6$	0.8	128	1 024
卷积层1	$1\times1,256$	—	128	256
卷积层2	$1\times1,256$	—	512	256
卷积层3	$1\times1,256$	—	2 048	256
卷积层4	$1\times1,256$	—	8 192	256
全连接层	—	—	8 192	20

下载: 导出CSV

表 2 S3DIS数据集分割结果比较 (%)

Table 2 Segmentation result comparisons on the S3DIS dataset (%)

评价指标	PointNet^[7]	RSNet^[9]	PointCNN^[10]	ResNet-FPN_C (本文)	U-Net_C (本文)
oAcc	78.5	—	88.14	89.6	88.53
mAcc	66.2	66.45	75.61	76.83	76.98
mIoU	47.6	56.47	65.39	67.05	67.37

下载: 导出CSV

表 3 S3DIS数据集各类别IoU分割结果比较 (%)

Table 3 Comparison of IoU for all categories on the S3DIS dataset (%)

类别	PointNet^[7]	RSNet^[9]	PointCNN^[10]	ResNet-FPN_C (本文)	U-Net_C (本文)
ceiling	88.0	92.48	94.78	91.99	91.46
floor	88.7	92.83	97.30	94.99	94.12
wall	69.3	78.56	75.82	77.04	79.00
beam	42.4	32.75	63.25	50.29	51.92
column	23.1	34.37	51.71	39.40	40.35
window	47.5	51.62	58.38	65.57	65.63
door	51.6	68.11	57.18	72.38	72.60
table	54.1	60.13	71.63	72.20	71.57
chair	42.0	59.72	69.12	77.10	77.56
sofa	9.6	50.22	39.08	54.87	55.89
bookcase	38.2	16.42	61.15	59.24	59.10
board	29.4	44.85	52.19	53.44	54.54
clutter	35.2	52.03	58.59	63.11	62.11

下载: 导出CSV

表 4 ScanNet数据集分割结果比较 (%)

Table 4 Segmentation result comparisons on the ScanNet dataset (%)

评价指标	ScanNet^[22]	PointNet^[7]	PointNet++^[8]	RSNet^[9]	PointCNN^[10]	ResNet-FPN_C (本文)	U-Net_C (本文)
$oAcc$	73.0	73.90	84.50	—	85.1	85.5	85.3
$mAcc$	—	19.90	43.77	48.37	57.9	63.1	62.8
$mIoU$	—	14.69	34.26	39.35	43.7	45.0	46.5

下载: 导出CSV

表 5 ScanNet数据集各类别IoU分割结果比较 (%)

Table 5 Comparison of IoU for all categories on the ScanNet dataset (%)

类别	PointNet^[7]	PointNet++^[8]	RSNet^[9]	PointCNN^[10]	ResNet-FPN_C (本文)	U-Net_C (本文)
wall	69.44	77.48	79.23	74.5	77.1	77.8
floor	88.59	92.50	94.10	90.7	90.6	90.8
chair	35.93	64.55	64.99	68.8	76.4	76.6
table	32.78	46.60	51.04	55.3	52.5	50.8
desk	2.63	12.69	34.53	28.8	29.1	25.8
bed	17.96	51.32	55.95	56.1	57.0	57.4
bookshelf	3.18	52.93	53.02	38.9	42.7	42.3
sofa	32.79	52.27	55.41	60.1	61.5	60.9
sink	0.00	30.23	34.84	41.9	41.8	41.7
bathtub	0.17	42.72	49.38	73.5	61.0	68.6
toilet	0.00	31.37	54.16	73.4	69.8	73.2
curtain	0.00	32.97	6.78	36.1	35.4	41.9
counter	5.09	20.04	22.72	22.5	20.0	19.7
door	0.00	2.02	3.00	7.5	26.0	28.4
window	0.00	3.56	8.75	11.0	15.9	18.8
shower curtain	0.00	27.43	29.92	40.6	38.1	42.6
refrigerator	0.00	18.51	37.90	43.4	47.3	52.3
picture	0.00	0.00	0.95	1.3	5.8	8.6
cabinet	4.99	23.81	31.29	26.4	27.5	27.5
other furniture	0.13	2.20	18.98	23.6	25.8	24.5

下载: 导出CSV

表 6 耗时比较

Table 6 Comparison of running time

	PointNet^[7]	PointCNN^[10]	ResNet-FPN_C (本文)	U-Net_C (本文)
输入点个数	8 192	2 048	8 192	8 192
单batch训练时间 (s)	0.035	0.047	0.060	0.140
单batch前向传播时间 (s)	0.023	0.016	0.042	0.068

下载: 导出CSV

参考文献(22)

[1]	Anguelov D, Taskarf B, Chatalbashev V, Koller D, Gupta D, Heitz G, et al. Discriminative learning of Markov random fields for segmentation of 3D scan data. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE, 2005. 169−176
[2]	Wu Z R, Song S R, Khosla A, Yu F, Zhang L G, Tang X O, et al. 3D ShapeNets: A deep representation for volumetric shapes. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015. 1912−1920
[3]	Maturana D, Scherer S. VoxNet: A 3D convolutional neural network for real-time object recognition. In: Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. Hamburg, Germany: IEEE, 2015. 922−928
[4]	Riegler G, Ulusoys A, Geiger A. OctNet: Learning deep 3D representations at high resolutions. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017. 6620−6629
[5]	Wang P S, Liu Y, Guo Y X, Sun C Y, Tong X. O-CNN: Octree-based convolutional neural networks for 3D shape analysis. ACM Transactions on Graphics, 2017, 36(4): Article No. 72
[6]	Klokov R, Lempitsky V. Escape from cell: Deep kd-networks for the recognition of 3D point cloud models. In: Proceedings of the 2017 International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 863−872
[7]	Qi C R, Su H, Mo K C, Guibas L J. Pointnet: Deep learning on point sets for 3D classification and segmentation. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017. 77−85
[8]	Qi C R, Yi L, Su H, Guibas L J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: Proceedings of the 2017 Advances in Neural Information Processing Systems. Long Beach, USA: Curran Associates, 2017. 5100−5109
[9]	Huang Q G, Wang W Y, Neumann U. Recurrent slice networks for 3D segmentation of point clouds. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 262−2635
[10]	Li Y Y, Bu R, Sun M C, Wu W, Di X H, Chen B Q. PointCNN: Convolution on X-transformed points. In: Proceedings of the 2018 Advances in Neural Information Processing Systems. Montreal, Canada: Curran Associates, 2018. 828−838
[11]	Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation. In: Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: MICCAI, 2015. 234−241
[12]	He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 770−778
[13]	Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 2818−2826
[14]	Huang G, Liu Z, Maaten L V D, Weinberger K Q. Densely connected convolutional networks. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017. 2261−2269
[15]	Lin T Y, Dollar P, Girshick R, He K M, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017. 936−944
[16]	Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE, 2014. 580−587
[17]	Girshick R. Fast R-CNN. In: Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015. 1440−1448
[18]	Ren S Q, He K M, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149 doi: 10.1109/TPAMI.2016.2577031
[19]	He K M, Gkioxari G, Dollar P, Girshick R. Mask R-CNN. In: Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 2980−2988
[20]	Liu S, Qi L, Qin H F, Shi J P, Jia J. Path aggregation network for instance segmentation. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 8759−8768
[21]	Armeni I, Sener O, Zamir A R, Jiang H L, Brilakis L, Fischer M, Savarese S. 3D semantic parsing of large-scale indoor spaces. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 1534−1543
[22]	Dai A, Chang A X, Savva M, Halber M, Funkhouser T, Nieβner M. ScanNet: Richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017. 2432−2443