摘要: 遮挡及背景中相似物干扰是行人检测准确率较低的主要原因. 针对该问题, 提出一种结合语义和多层特征融合(Combining semantics with multi-level feature fusion, CSMFF)的行人检测算法. 首先, 融合多个卷积层特征, 并在融合层上添加语义分割, 得到的语义特征与相应的卷积层连接作为行人位置的先验信息, 增强行人和背景的辨别性. 然后, 在初步回归的基础上构建行人二次检测模块(Pedestrian secondary detection module, PSDM), 进一步排除误检物体. 实验结果表明, 所提算法在数据集Caltech和CityPersons上漏检率(Miss rate, MR)为7.06 %和11.2 %. 该算法对被遮挡的行人具有强鲁棒性, 同时可方便地嵌入到其他检测框架.Abstract: Occlusion and similar objects in the background typically degrade the accuracy of pedestrian detection. To solve the above problems, this paper proposes a pedestrian detection algorithm that combines semantics with multi-level feature fusion (CSMFF). Firstly, multi-convolutional-layer features are fused, and semantic segmentation is added to the fusion layer. The obtained semantic features are connected to the corresponding convolutional layers as the prior information of the pedestrian target location, which enhances the discrimination between pedestrian and background. Based on the preliminary regression, a pedestrian secondary detection module (PSDM) is constructed to further eliminate false positives. The experimental results show that the miss rates (MR) of the proposed algorithm on the datasets Caltech and CityPersons are 7.06 % and 11.2 %, respectively. The algorithm has strong robustness to occluded pedestrians, and can be easily embedded into other detection frameworks.
Key words:
- Pedestrian detection /
- semantic segmentation /
- feature fusion /
- occlusion /
- secondary detection
表 1 Caltech数据集中部分子集的划分标准
Table 1 Evaluation settings for partial subsets of the Caltech dataset
子集 行人高度 (Height) 行人被遮挡程度 (Occlusion) Reasonable $ > $50 PXs occ$ < $0.35 Partial $ > $50 PXs 0.10$ < $occ$ \le $0.35 Heavy $ > $50 PXs 0.35$ < $occ$ \le $0.80 表 2 CityPersons数据集中部分子集的划分标准
Table 2 Evaluation settings for partial subsets of the CityPersons dataset
子集 行人高度 (Height) 行人被遮挡程度 (Occlusion) Bare $ > $50 PXs occ$ \le $0.10 Reasonable $ > $50 PXs occ$ < $0.35 Partial $ > $50 PXs 0.10$ < $occ$ \le $0.35 Heavy $ > $50 PXs 0.35$ < $occ$ \le $0.80 表 3 在Caltech测试数据集上对比算法性能以及运行速度比较
Table 3 Performance and runtime comparisons of our proposed CSMFF with state-of-the-art approaches on the Caltech test dataset
方法 Reasonable MR (%) Partial MR (%) Heavy MR (%) 速度 (s/帧) PL-CNN[16] 12.40 16.68 — — Faster R-CNN$ + $ATT[32] 10.33 22.29 45.18 — MS-CNN[10] 9.95 19.24 59.94 0.40 RPN$ + $BF[13] 9.58 24.23 74.36 0.60 AdaptFasterRCNN[14] 9.18 26.55 57.58 — F-DNN[21] 8.65 15.41 55.13 0.30 PCN[20] 8.45 16.09 55.81 — F-DNN$ + $SS[21] 8.18 15.11 53.76 2.48 CSMFF 7.06 14.36 50.62 0.12 表 4 在CityPersons测试数据集上不同算法性能比较
Table 4 Performance comparison of our proposed CSMFF with state-of-the-art approaches on the CityPersons test dataset
表 5 在Caltech测试数据集上融合不同卷积层的性能
Table 5 Performance of fusing different convolutional layers on the Caltech test dataset
卷积层 MR (%) Conv2_2 Conv3_3 Conv4_3 Conv5_3 PFEM CSMFF √ √ √ 12.22 7.06 √ √ √ 32.42 18.15 √ √ √ √ 18.72 11.79 表 6 在Caltech数据集上测试每个组件的消融实验
Table 6 Ablation experiments for testing each component on the Caltech dataset
组件 选择 Faster R-CNN √ 多层特征融合 √ √ √ 语义分割分支 √ √ PSDM √ PFEM MR (%) 14.93 13.27 12.58 12.22 CSMFF MR (%) 12.11 9.53 8.68 7.06 -
