基于可见光与红外热图像的行车环境复杂场景分割

陈武阳; 赵于前; 阳春华; 张帆; 余伶俐; 陈白帆

doi:10.16383/j.aas.c210029

基于可见光与红外热图像的行车环境复杂场景分割

doi: 10.16383/j.aas.c210029

陈武阳^{1, 2,},
赵于前^{1, 3, 4,},
阳春华^1,,
张帆^{1, 3,},
余伶俐^{1, 3,},
陈白帆^{1, 4,}

1.
中南大学自动化学院长沙 410083
2.
中南大学计算机学院长沙 410083
3.
湖南省高强度坚固件智能制造工程技术研究中心常德 415701
4.
湖南湘江人工智能学院长沙 410005

基金项目: 国家自然科学基金(62076256) 中南大学研究生校企联合创新项目(2021XQLH048)资助

详细信息

作者简介:
陈武阳：中南大学自动化学院和计算机学院硕士研究生. 主要研究方向为计算机视觉与智能感知. E-mail: chenwuyanghn@163.com

赵于前：中南大学自动化学院教授. 主要研究方向为计算机视觉, 智能感知, 机器学习, 精准医疗. 本文通信作者. E-mail: zyq@csu.edu.cn

阳春华：中南大学自动化学院教授. 主要研究方向为复杂工业过程建模与优化控制, 智能自动化控制系统, 自动检测技术与仪器装置. E-mail: ychh@csu.edu.cn

张帆：中南大学自动化学院讲师. 主要研究方向为图像处理, 激光制造. E-mail: zhangfan219@csu.edu.cn

余伶俐：中南大学自动化学院教授. 主要研究方向为智能车辆路径规划与导航控制. E-mail: llyu@csu.edu.cn

陈白帆：中南大学自动化学院副教授. 主要研究方向为智能驾驶, 环境感知, 计算机视觉. E-mail: chenbaifan@csu.edu.cn

计量
- 文章访问数: 1842
- HTML全文浏览量: 972
- PDF下载量: 387
- 被引次数: 0
出版历程
- 收稿日期: 2021-01-09
- 修回日期: 2021-04-16
- 网络出版日期: 2021-05-27
- 刊出日期: 2022-02-18

Complex Scene Segmentation Based on Visible and Thermal Images in Driving Environment

CHEN Wu-Yang^{1, 2
,},
ZHAO Yu-Qian^{1, 3, 4
,},
YANG Chun-Hua^1
,,
ZHANG Fan^{1, 3
,},
YU Ling-Li^{1, 3
,},
CHEN Bai-Fan^{1, 4
,}

1.
School of Automation, Central South University, Changsha 410083
2.
School of Computer Science and Engineering, Central South University, Changsha 410083
3.
Hunan Engineering & Technology Research Center of High Strength Fastener Intelligent Manufacturing, Changde 415701
4.
Hunan Xiangjiang Artificial Intelligence Academy, Changsha 410005

Funds: Supported by National Natural Science Foundation of China (62076256), Graduate School-enterprise Joint Innovation Project of Central South University (XX)

More Information

Author Bio:
CHEN Wu-Yang　Master student at the School of Automation, and School of Computer Science and Engineering, Central South University. Her research interest covers computer vision and intelligent perception

ZHAO Yu-Qian　Professor at the School of Automation, Central South University. His research interest covers computer vision, intelligent perception, machine learning, and precision medicine. Corresponding author of this paper

YANG Chun-Hua　Professor at the School of Automation, Central South University. Her research interest covers modeling and optimal control of complex industrial process, intelligent automation control system, automatic measurement technology and instrument

ZHANG Fan　Lecturer at the School of Automation, Central South University. His research interest covers image processing and laser process

YU Ling-Li　Professor at the School of Automation, Central South University. Her research interest covers intelligent land vehicle path planning and navigation control

CHEN Bai-Fan　Associate professor at the School of Automation, Central South University. Her research interest covers intelligent vehicle, environment perception, and computer vision

摘要

摘要: 复杂场景分割是自动驾驶领域智能感知的重要任务, 对稳定性和高效性都有较高的要求. 由于一般的场景分割方法主要针对可见光图像, 分割效果非常依赖于图像获取时的光线与气候条件, 且大多数方法只关注分割性能, 忽略了计算资源. 本文提出一种基于可见光与红外热图像的轻量级双模分割网络(DMSNet), 通过提取并融合两种模态图像的特征得到最终分割结果. 考虑到不同模态特征空间存在较大差异, 直接融合将降低对特征的利用率, 本文提出了双路特征空间自适应(DPFSA)模块, 该模块能够自动学习特征间的差异从而转换特征至同一空间. 实验结果表明, 本文方法提高了对不同模态图像的利用率, 对光照变化有更强的鲁棒性, 且以少量参数取得了较好的分割性能.
- 场景分割 /
- 可见光图像 /
- 红外热图像 /
- 双模分割网络 /
- 双路特征空间自适应模块
Abstract: Complex scene segmentation is an important task of intelligent perception in the field of autonomous driving, which has high requirements for stability and efficiency. Since general scene segmentation methods mainly focus on visible images, the segmentation result is highly dependent on the light and weather conditions at the time of image acquisition, and most methods only focus on segmentation performance and ignore computing resources. This paper proposes a lightweight dual model segmentation network (DMSNet) based on visible and thermal images, which can extract and fuse the features of the two modal images to obtain a final segmentation result. For large differences in the feature spaces of different modalities, direct fusion will reduce the utilization of features. This paper proposes a dual-path feature space adaptation (DPFSA) module, which can automatically learn the differences among features and convert them to the same space. The experimental results show that this method can better utilize the inherent information between different modal images. Moreover, the proposed method is more robust to illumination changes and can achieve good segmentation performance with only a small number of parameters.
- Scene segmentation /
- visible images /
- thermal images /
- dual modal segmentation network /
- dual-path feature space adaptation module
注释:

1) 收稿日期 2021-01-09 录用日期 2021-04-16 Manuscript received January 9, 2021; accepted April 16, 2021 国家自然科学基金 (62076256), 中南大学研究生校企联合创新项目 (2021XQLH048) 资助 Supported by National Natural Science Foundation of China (62076256), Graduate School-enterprise Joint Innovation Project of Central South University (2021XQLH048) 本文责任编委张向荣 Recommended by Associate Editor ZHANG Xiang-Rong 1. 中南大学自动化学院长沙 410083 2. 中南大学计算机学院长沙 410083 3. 湖南省高强度坚固件智能制造工程技术研究中心常德 415701 4. 湖南湘江人工智能学院长沙 410005 1. School of Automation, Central South University, Changsha410083 2. School of Computer Science and Engineering, Central South University, Changsha 410083 3. Hunan Engineering

2) & Technology Research Center of High Strength Fastener Intelli-gent Manufacturing, Changde 415701 4. Hunan Xiangjiang Ar-tificial Intelligence Academy, Changsha 410005

HTML全文

图 1 DMSNet模型结构图

Fig. 1 The architecture of DMSNet

下载: 全尺寸图片幻灯片

图 2 双路特征空间自适应模块(DPFSA)结构图

Fig. 2 The architecture of dual-path feature space adaptation module (DPFSA)

下载: 全尺寸图片幻灯片

图 3 调整DPFSA内部结构得到的另外两个模块

Fig. 3 The other two modules obtained by adjusting the internal structure of DPFSA

下载: 全尺寸图片幻灯片

图 4 DMSNet、FuseNet和MFNet在数据集A上的分割结果对比

Fig. 4 Comparison of segmentation results of DMSNet, FuseNet and MFNet on dataset A

下载: 全尺寸图片幻灯片

表 1 不同模块在数据集A上的mAcc、mIoU值与参数量比较

Table 1 Comparison of mAcc and mIoU values and parameter values of different modules on dataset A

Models	mAcc	mIoU	Parameters
MFNet	63.5	64.9	2.81 MB
FuseNet	61.9	63.8	46.4 MB
DMSNet (DPFSA-1)	65.6	68.1	5.45 MB
DMSNet (DPFSA-2)	68.9	65.1	5.54 MB
DMSNet (DPFSA)	69.7	69.6	5.63 MB
注: Parameters 代表整个分割模型的参数量, 而非模块的参数量

下载: 导出CSV

表 2 不同损失函数在数据集A上的Acc结果与mAcc、mIoU值

Table 2 Acc results and mAcc and mIoU values of different loss functions on dataset A

Losses	Acc									mAcc	mIoU
Losses	1	2	3	4	5	6	7	8	9	mAcc	mIoU
CE	97.6	86.5	84.9	77.8	69.5	53.3	0.0	79.8	77.4	69.7	69.6
Focal	97.3	78.7	80.5	67.8	55.1	41.6	0.0	63.5	50.8	59.5	65.6
Dice	96.8	77.7	83.8	0.0	0.0	0.0	0.0	36.6	0.0	32.8	25.3
CE+Dice	97.6	87.6	83.5	79.5	73.2	47.5	0.0	74.7	92.1	70.7	70.3
注: 表中数字1 ~ 9为分割类别标号, 分别为 1: Unlabeled, 2: Car, 3: Pedestrian, 4: Bike, 5: Curve, 6: Car stop, 7: Guardrail, 8: Color cone, 9: Bump

下载: 导出CSV

表 3 不同模型在数据集A上的Acc与IoU结果对比

Table 3 Comparison of Acc and IoU results of different models on dataset A

Models	2		3		4		5		6		7		8		9		mAcc	mIoU
Models	Acc	IoU	Acc	IoU	Acc	IoU	Acc	IoU	Acc	IoU	Acc	IoU	Acc	IoU	Acc	IoU	mAcc	mIoU
SegNet (3ch)	82.6	94.1	67.7	75.6	73.7	80.8	55.9	97.1	39.1	43.5	0.0	0.0	0.0	0.0	48.9	86.8	51.7	59.7
SegNet (4ch)	84.4	93.1	85.5	84.7	76.0	74.7	58.2	96.5	44.2	43.6	0.0	0.0	0.0	0.0	74.4	95.6	57.8	60.9
ENet (3ch)	85.3	92.3	53.8	68.4	67.7	71.7	52.2	95.7	16.9	24.2	0.0	0.0	0.0	0.0	0.0	0.0	41.5	43.8
ENet (4ch)	75.5	89.6	68.1	71.7	66.8	67.6	63.2	88.5	41.5	34.1	0.0	0.0	0.0	0.0	93.2	78.1	56.2	53.6
FuseNet	76.8	91.2	69.3	80.5	71.2	78.6	60.1	95.8	30.8	28.1	0.0	0.0	68.4	37.9	83.1	98.5	61.9	63.8
MFNet	78.9	92.9	82.7	84.8	68.1	75.7	64.4	97.2	31.6	29.7	0.0	0.0	71.8	40.6	77.1	98.4	63.5	64.9
DMSNet	87.6	95.8	83.5	88.7	79.5	82.5	73.2	97.9	47.5	35.7	0.0	0.0	74.7	62.0	92.1	99.8	70.7	70.3
注: 表中数字2 ~ 9为分割类别标号, 表示法同表 2

下载: 导出CSV

表 4 不同模型在数据集B上的Acc与IoU结果对比

Table 4 Comparison of Acc and IoU results of different models on dataset B

Models	2		3		4		5		mAcc	mIoU
Models	Acc	IoU	Acc	IoU	Acc	IoU	Acc	IoU	mAcc	mIoU
SegNet (3ch)	0.0	0.0	71.2	79.3	0.0	0.0	21.6	47.1	38.4	31.6
SegNet (4ch)	0.0	0.0	62.9	70.1	0.0	0.0	30.5	46.8	38.5	29.2
ENet (3ch)	0.0	0.0	77.6	85.5	0.0	0.0	73.4	90.9	49.9	44.1
ENet (4ch)	0.0	0.0	72.9	74.9	0.0	0.0	74.8	89.6	49.1	41.1
FuseNet	72.7	43.1	91.4	92.3	74.4	78.9	99.9	99.8	87.4	78.5
MFNet	66.7	47.0	88.7	91.0	95.2	90.1	96.3	99.8	89.1	81.9
DMSNet	67.8	43.5	89.1	90.4	96.3	97.5	99.3	99.9	90.2	82.8
注: 表中数字2 ~ 5为分割类别标号, 分别为 2: Fire-Extinguisher, 3: Backpack, 4: Hand-Drill, 5: Survivor

下载: 导出CSV

表 5 不同模型在数据集A白天与黑夜环境下的mAcc与mIoU结果对比

Table 5 Comparison of mAcc and mIoU results of different models on dataset A in daytime and nighttime

Models	Daytime		Nighttime
Models	mAcc	mIoU	mAcc	mIoU
SegNet (3ch)	47.8	55.5	52.6	61.3
SegNet (4ch)	45.4	49.3	58.2	62.9
ENet (3ch)	42.1	40.8	38.6	39.1
ENet (4ch)	44.1	45.9	57.1	54.3
FuseNet	50.6	61.2	63.4	64.7
MFNet	49.0	63.3	65.8	65.1
DMSNet	57.7	69.1	71.8	71.3

下载: 导出CSV

参考文献(22)

[1]	Feng D, Haase-Schütz C, Rosenbaum L, et al. Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE/ACM transactions on computational biology and bioinformatics, 2021, 22(3): 1341-1360
[2]	Li C, Xu J X, Liu Q G, et al. Multi-View Mammographic Density Classification by Dilated and Attention-Guided Residual Learning. IEEE transactions on Pattern Analysis and Machine Intelligence, 2020: 1-11
[3]	Chen L C, Yang Y, Wang J, Xu W, Yuille A L. Attention to scale: Scale-aware semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 2016. 3640−3649
[4]	Lin D, Ji Y F, Lischinski D, Cohen-Or D, Huang H. Multi-scale context intertwining for semantic segmentation. In: Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer, 2018. 603−619
[5]	Kieu M, Bagdanov A D, Bertini M, Bimbo A D. Task-conditioned domain adaptation for pedestrian detection in thermal imagery. In: Proceedings of the 2020 European Conference on Computer Vision. Glasgow, UK: Springer, 2020. 1−17
[6]	Liu T, Lam K M, Zhao R, Qiu G P. Deep cross-modal representation learning and distillation for illumination-invariant pedestrian detection. IEEE Transactions on Circuits and Systems for Video Technology, DOI: 10.1109/TCSVT.2021.3060162
[7]	陈虹, 郭露露, 宫洵, 高炳钊, 张琳. 智能时代的汽车控制. 自动化学报, 2020, 46(7): 1313−1332 Chen Hong, Guo Lu-Lu, Gong Xun, Gao Bing-Zhao, Zhang Lin. Automotive control in intelligent era. Acta Automatica Sinica, 2020, 46(7): 1313−1332
[8]	张新钰, 邹镇洪, 李志伟, 刘华平, 李骏. 面向自动驾驶目标检测的深度多模态融合技术[J]. 智能系统学报, 2020, 15(4): 758-771 Zhang Xin-Yu, Zou Zhen-Hong, Li Zhi-Wei, Liu Hua-Ping, Li Jun. Deep multi-modal fusion in object detection for autonomous driving. CAAI transactions on intelligent systems, 2020, 15(4): 1–14
[9]	Ha Q, Watanabe K, Karasawa T, Ushiku, Y, Harada, T. MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Vancouver, Canada: IEEE, 2017. 5108−5115
[10]	Sun Y X, Zuo W X, Liu M. Rtfnet: Rgb-thermal fusion network for semantic segmentation of urban scenes. IEEE Robotics and Automation Letters, 2019, 4(3): 2576-2583
[11]	He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 2016. 770−778
[12]	Zhao X M, Sun P P, Xu Z G, Min H G, Yu H K. Fusion of 3D LIDAR and camera data for object detection in autonomous vehicle applications. IEEE Sensors Journal, 2020, 20(9): 4901-4913 doi: 10.1109/JSEN.2020.2966034
[13]	Chen Z, Zhang J, Tao D C. Progressive lidar adaptation for road detection. IEEE/CAA Journal of Automatica Sinica, 2019, 6(3): 693-70 doi: 10.1109/JAS.2019.1911459
[14]	Hazirbas C, Ma L, Domokos C, Cremers D. Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture. In: Proceedings of the Asian conference on Computer Vision. Taipei, China: Springer, 2016. 213−228
[15]	Zhang Y T, Yin Z S, Nie L Z, Huang S. Attention Based Multi-Layer Fusion of Multispectral Images for Pedestrian Detection. IEEE Access, 2020, 8: 165071-165084 doi: 10.1109/ACCESS.2020.3022623
[16]	Guan D Y, Cao Y P, Yang J X, Cao Y L, Yang M Y. Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. Information Fusion, 2019, 50: 148-157 doi: 10.1016/j.inffus.2018.11.017
[17]	Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the International Conference on Machine Learning. Lille, France: ACM, 2015. 448−456
[18]	Maas A L, Hannun A Y, Ng A Y. Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the International Conference on Machine Learning. Atlanta, USA: ACM, 2013. 1−6
[19]	Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. Fort Lauderdale, Florida, USA: JMLR, 2011. 315−323
[20]	Milletari F, Navab N, Ahmadi S A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: Proceedings of the Fourth International Conference on 3D Vision. Stanford University, Stanford, CA, USA: IEEE, 2016. 565−571
[21]	Shivakumar S S, Rodrigues N, Zhou A, Miller L D, Kumar V, Taylor C J. Pst900: Rgb-thermal calibration, dataset and segmentation network. In: Proceedings of the International Conference on Robotics and Automation. Paris, France: IEEE, 2020: 9441−9447
[22]	Lin T Y, Goyal P, Girshick R, He K M, Dollar P. Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 2980−2988