-
摘要: 视频中的人群计数在智能监控领域具有重要价值. 由于摄像机透视效果、图像背景、人群密度分布不均匀和行人遮挡等干扰因素的制约, 基于底层特征的传统计数方法准确率较低. 本文提出一种基于序的空间金字塔池化(Rank-based spatial pyramid pooling, RSPP)网络的人群计数方法. 该方法将原图像分成多个具有相同透视范围的子区域并在各个子区域分别取不同尺度的子图像块, 采用基于序的空间金字塔池化网络估计子图像块人数, 然后相加所有子图像块人数得出原图像人数. 提出的图像分块方法有效地消除了摄像机透视效果和人群密度分布不均匀对计数的影响. 提出的基于序的空间金字塔池化不仅能够处理多种尺度的子图像块, 而且解决了传统池化方法易损失大量重要信息和易过拟合的问题. 实验结果表明, 本文方法相比于传统方法具有准确率高和鲁棒性好的优点.Abstract: Crowd counting in videos has an important value in the field of intelligent surveillance. Due to the constraints resulting from camera perspective, uneven distribution of crowd density, background clutter, and occlusions, traditional low-level features-based methods suffer from low counting accuracy. In this paper, a new crowd counting method is proposed based on rank-based spatial pyramid pooling (RSPP) network. In the proposed method, the original image is divided into several sub-regions with the same scope of perspective, and then multi-scale sub-image blocks are respectively taken from different sub-regions. Rank-based spatial pyramid pooling network is used to get the numbers of pedestrians in sub-image blocks. Then summing the numbers of persons of all sub-image blocks gives the total number of people on the image. The proposed image blocking method eliminates the effect of camera perspective and uneven distribution of crowd density on crowd counting. The proposed rank-based spatial pyramid pooling can not only handle multi-scale sub-image blocks, but also solve the problem of huge important information loss and over-fitting encountered by traditional pooling methods. Experimental results show that the proposed method has the advantages of high accuracy and good robustness compared with traditional methods.
-
表 1 人群CNN 模型的详细结构
Table 1 Architecture specics for crowd CNN model
层数 1 2 3 4 5(输出) 操作 conv+relu+rsp+rn conv+relu+rsp+rn conv+relu+rspp full full 通道数 64 64 64 512 1 卷积大小 5×5 5×5 5×5 - - 卷积步长 1×1 1×1 1×1 - - 池化大小 3×3 3×3 {4×4, 2×2, 1×1} - - 池化步长 2×2 2×2 - - - 填充大小2 ×2×2×2 2×2×2×2 2×2×2×2 - - 表 2 实验数据
Table 2 Experimental data
图像块尺度 训练集 测试集 64×64 104 000 3 600 44×44 104 000 4 800 28×28 44 000 3 600 表 3 多种池化方法在尺度为64 的子图像块上的测试结果
Table 3 Testing results for sub-image blocks with the scale of 64 of various pooling methods
池化方法 训练集 测试集 MAE MSE MAE MSE 平均池化 1.12 2.29 1.52 3.13 最大池化 0.27 0.13 0.84 1.15 随机池化 1.29 2.27 1.42 3.18 基于序的随机池化 0.43 0.32 0.64 0.81 表 4 子图像块上的测试结果
Table 4 The testing results in sub-image blocks
图像块尺度 联合训练 单独训练 MAE MSE MAE MSE 64×64 0.64 0.81 0.64 0.81 44×44 0.84 1.08 1.98 5.7 28×28 0.72 1.06 1.68 4.16 -
[1] Wu B, Nevatia R. Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors. In: Proceedings of the 10th IEEE International Conference on Computer Vision. Beijing, China: IEEE, 2005. 90-97 [2] Zhao T, Nevatia R, Wu B. Segmentation and tracking of multiple humans in crowded environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(7) : 1198-1211 [3] Chan A B, Liang Z S J, Vasconcelos N. Privacy preserving crowd monitoring: counting people without people models or tracking. In: Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK: IEEE, 2008. 1-7 [4] Chan A B, Vasconcelos N. Counting people with low-level features and Bayesian regression. IEEE Transactions on Image Processing, 2012, 21(4) : 2160-2177 [5] Idrees H, Saleemi I, Seibert C, Shah M. Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE, 2013. 2547-2554 [6] Lempitsky V, Zisserman A. Learning to count objects in images. In: Proceedings of Advances in Neural Information Processing Systems. Vancouver, Canada: NIPS, 2010. 1324-1332 [7] Ma W, Huang L, Liu C. Crowd density analysis using co-occurrence texture features. In: Proceedings of the 5th IEEE International Conference on Computer Sciences and Convergence Information Technology. Seoul, Korea: IEEE, 2010. 170-175 [8] Kong D, Gray D, Tao H. A viewpoint invariant approach for crowd counting. In: Proceedings of the 18th IEEE International Conference on Pattern Recognition. Hong Kong, China: IEEE, 2006. 1187-1190 [9] Chen K, Loy C C, Gong S G, Xiang T. Feature mining for localised crowd counting. In: Proceedings of the 23rd British Machine Vision Conference. Surrey, British: BMVA Press, 2012. 1-3 [10] Ryan D, Denman S, Sridharan S, Fookes C. An evaluation of crowd counting methods, features and regression models. Computer Vision and Image Understanding, 2015, 130: 1-17 [11] Rosten E, Porter R, Drummond T. Faster and better: a machine learning approach to corner detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(1) : 105-119 [12] Wu X Y, Liang G Y, Lee K K, Xu Y. Crowd density estimation using texture analysis and learning. In: Proceedings of the 2006 IEEE International Conference on Robotics and Biomimetics. Kunming, China: IEEE, 2006. 214-219 [13] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786) : 504-507 [14] Zeiler M D, Fergus R. Visualizing and understanding convolutional networks. In: Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014. 818-833 [15] Nair V, Hinton G E. Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning. Haifa, Israel: JMLR, 2010. 807-814 [16] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems. Nevada, USA: NIPS, 2012. 1097-1105 [17] He K M, Zhang X Y, Ren S Q, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014. 346-361 [18] Zeiler M D, Fergus R. Stochastic pooling for regularization of deep convolutional neural networks. In: Proceedings of the 2013 International Conference on Learning Representation. Arizona, USA: ICLR, 2013. 1-9 [19] Sainath T N, Kingsbury B, Saon G, Soltau H, Mohamed A R, Dahl G, Ramabhadran B. Deep convolutional neural networks for large-scale speech tasks. Neural Networks, 2015, 64: 39-48 [20] Michalewicz Z. Genetic Algorithms + Data Structures=Evolution Programs. Berlin Heidelberg: Springer Science & Business Media, 2013. 59-61 [21] Saunders C, Gammerman A, Vovk V. Ridge regression learning algorithm in dual variables. In: Proceedings of the 15th International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1998. 515-521 [22] Jia Y Q, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia. Florida, USA: ACM, 2014. 675-678 [23] Zhang Z X, Wang M, Geng X. Crowd counting in public video surveillance by label distribution learning. Neurocomputing, 2015, 166: 151-163 期刊类型引用(38)
1. 谌婷婷,魏怡. 基于改进YOLOv4的混凝土裂缝检测方法. 激光杂志. 2024(01): 80-85 . 百度学术
2. 李伟,张晓蓉,陈鹏,李清,张长青. 基于正态逆伽马分布的多尺度融合人群计数算法. 计算机应用. 2024(07): 2243-2249 . 百度学术
3. 颜冰,刘佳庆,池强. 聚类Anchor参数与边界框损失优化的室内人群检测. 电脑知识与技术. 2023(03): 30-33 . 百度学术
4. 孙辉,史玉龙,王蕊. 基于对比层级相关性传播的由粗到细的类激活映射算法研究. 电子与信息学报. 2023(04): 1454-1463 . 百度学术
5. 王艳,夏坤,束鑫. 基于OEDLBP的人脸欺诈检测算法研究. 江苏科技大学学报(自然科学版). 2023(03): 73-80 . 百度学术
6. 侯慧欣,吕学强,游新冬,黄跃. 改进U-net的红外影厅图像人数统计方法. 计算机工程与设计. 2021(03): 742-748 . 百度学术
7. 刘云玲,张品戈,王千航,周睿琪,赵佳,肖永贵,马韫韬. 基于多列空洞卷积神经网络的麦穗计数方法研究. 吉林农业大学学报. 2021(02): 171-180 . 百度学术
8. 赵宏伟,徐亮,王冶,安云云,钱华山. 基于尺度融合的密集人群计数. 计算机系统应用. 2021(10): 1-11 . 百度学术
9. 孟月波,陈宣润,刘光辉,徐胜军. 多特征信息融合的人群密度估计方法. 激光与光电子学进展. 2021(20): 276-287 . 百度学术
10. 张林鹏,汪西原,李强. 基于双池化特征加权结构CNN的图像分类. 计算机与现代化. 2021(11): 67-71+88 . 百度学术
11. 赵建敏,李雪冬,李宝山. 基于无人机图像的羊群密集计数算法研究. 激光与光电子学进展. 2021(22): 220-229 . 百度学术
12. 张传伟,曾虹钧,杨萌月,李波,陈尚瑞. 基于多分辨率滤波通道的多尺度行人检测. 计算机工程. 2020(02): 235-241 . 百度学术
13. 孟月波,纪拓,刘光辉,徐胜军,李彤月. 编码-解码多尺度卷积神经网络人群计数方法. 西安交通大学学报. 2020(05): 149-157 . 百度学术
14. 肖志云,赵晓陈. 基于双池化与多尺度核特征加权CNN的典型牧草识别. 农业机械学报. 2020(05): 182-191 . 百度学术
15. 邓远志,胡钢. 基于特征金字塔的人群密度估计方法. 测控技术. 2020(06): 108-114 . 百度学术
16. 束鑫,唐慧,杨习贝,宋晓宁,吴小俊. 基于差分量化局部二值模式的人脸反欺诈算法研究. 计算机研究与发展. 2020(07): 1508-1521 . 百度学术
17. 李培媛,黄迟. 基于CSPPNet与集成学习的人类蛋白质图像分类方法. 计算机工程. 2020(08): 235-242 . 百度学术
18. 张杏蔓,鲁工圆. 基于视频图像分析的地铁列车车辆拥挤度识别方法研究. 交通运输工程与信息学报. 2020(03): 142-152 . 百度学术
19. 李琦,尚绛岚,李宝山. 基于头部图像特征的草原羊自动计数方法. 中国测试. 2020(11): 20-24 . 百度学术
20. 曹金梦,倪蓉蓉,杨彪. 基于多尺度多任务卷积神经网络的人群计数. 计算机应用. 2019(01): 199-204 . 百度学术
21. 杨林,吕学强,张鑫,张凯. 像素特征与粘连人体分割结合的人数统计方法. 计算机工程与设计. 2019(02): 455-461 . 百度学术
22. 贾翻连,张丽红. 基于改进的卷积神经网络的人群密度估计. 计算机技术与发展. 2019(02): 77-80 . 百度学术
23. 尚重阳,赵东波,陈杰. 基于深度CNN的改进弱监督学习方法设计与验证. 重庆邮电大学学报(自然科学版). 2019(02): 183-190 . 百度学术
24. 雷翰林,张宝华. 基于多模型深度卷积网络融合的人群计数算法. 激光技术. 2019(04): 40-45 . 百度学术
25. 车令夫,田宇坤,朱海平,张军平. 基于最优输运的迁移学习. 模式识别与人工智能. 2019(06): 481-493 . 百度学术
26. 金侠挺,王耀南,张辉,刘理,钟杭,贺振东. 基于贝叶斯CNN和注意力网络的钢轨表面缺陷检测系统. 自动化学报. 2019(12): 2312-2327 . 本站查看
27. 李强,康子路. 基于深度时空特征卷积-池化的视频人群计数方法. 电信科学. 2018(06): 72-79 . 百度学术
28. 陈思秦. 基于全卷积神经网络的人群计数. 电子设计工程. 2018(02): 75-79 . 百度学术
29. 张善新,范强,周治平. 基于贝叶斯优化神经网络的物体形状分类. 激光与光电子学进展. 2018(06): 179-184 . 百度学术
30. 张小锋,刘红铮. 基于卷积神经网络的花朵图片分类算法. 计算机与现代化. 2018(09): 52-55 . 百度学术
31. 单洪明,张军平. 实值多变量维数约简:综述. 自动化学报. 2018(02): 193-215 . 本站查看
32. 邓仕虎,张兴国,王小勇,朱俊丰,王秀. 视频和GIS协同的人群状态感知模型. 信阳师范学院学报(自然科学版). 2018(01): 59-63 . 百度学术
33. 陈朋,汤一平,王丽冉,何霞. 多层次特征融合的人群密度估计. 中国图象图形学报. 2018(08): 1181-1192 . 百度学术
34. 林培群,雷永巍,张孜,陈丽甜. 面向手机信令数据的交通枢纽人流量短时预测算法. 哈尔滨工业大学学报. 2018(09): 89-95 . 百度学术
35. 夏为为,夏哲雷. 基于卷积神经网络的宫颈癌细胞图像识别的改进算法. 中国计量大学学报. 2018(04): 439-444 . 百度学术
36. 刘曼,彭月平,姜源. 基于可变矩形框的人群密度数值估计算法研究. 科学技术与工程. 2017(17): 266-271 . 百度学术
37. 刘阳阳,张骏,高欣健,张旭东,高隽. 基于卷积递归神经网络和核超限学习机的3D目标识别. 模式识别与人工智能. 2017(12): 1091-1099 . 百度学术
38. 刘曼,彭月平,姜源. 基于小波包分解和分形法的人群密度等级分类算法研究. 中国科技论文. 2017(17): 1981-1987 . 百度学术
其他类型引用(59)
-