-
摘要: 随着国家对社会公共安全的日益重视, 无重叠视域监控系统已大规模的普及.行人再识别任务通过匹配不同视域摄像机下的行人目标, 在当今环境下显得尤为重要.由于深度学习依赖大数据解决过拟合的特性, 针对当前视频行人再识别数据量较小和学习特征单一的问题, 我们提出了一种基于视频的改进行人再识别方法, 该方法通过生成对抗网络去生成视频帧序列来增加样本数量和加入了行人关节点的特征信息去提升模型效率.实验结果表明, 本文提出的改进方法可以有效地提高公开数据集的识别率, 在PRID2011, iLIDS-VID数据集上进行实验, Rank 1分别达到了80.2%和66.3 %.Abstract: As the government keeps attaching importance to public security, non-overlapping viewsheds surveillance systems have been deployed widely. It has become especially important to recognize pedestrian target through matching cameras with different viewsheds in nowadays. Deep learning relies on big data to solve overfitting. However, the current video-based person re-identification only has small data volume and homogeneous learning features. To solve this, we put forward a method to improve person re-identification based on the video. This method can increase the sample quantity by generating video frame sequence through generative adversarial network. It also adds the feature information of the pedestrian joints, which can improve the model efficiency. The experiment result shows that the modified method discussed in this paper can improve the recognition rate of public datasets effectively. In the experiments on PRID2011 and iLIDS-VID, Rank 1 attained 80.2 % and 66.3 %, respectively.
-
Key words:
- Person re-identiflcation /
- deep learning /
- generative adversarial network(GAN) /
- human pose estimation
1) 本文责任编委 刘青山 -
表 1 不同算法在PRID2011数据集上的识别率(%)
Table 1 Matching rates of different methods on the PRID2011 dataset (%)
表 2 不同算法在PRID2011数据集上对识别率的影响(%)
Table 2 The influence of different methods on matching rates based on PRID2011 dataset (%)
方法 Rank 1 Rank 5 Rank 10 Rank 20 ASTPN 77.0 95.0 99.0 99.0 ASTPN+GAN 79.2 95.3 99.2 99.2 ASTPN+KeyPoint 78.6 95.1 99.1 99.1 本文方法 80.2 96.0 99.1 99.2 表 3 不同算法在iLIDS-VID数据集上的识别率(%)
Table 3 Matching rates of different methods on the iLIDS-VID dataset (%)
表 4 不同算法在iLIDS-VID数据集上对识别率的影响(%)
Table 4 The influence of different methods on matching rates based on iLIDS-VID dataset (%)
方法 Rank 1 Rank 5 Rank 10 Rank 20 ASTPN 62.0 86.0 94.0 98.0 ASTPN+GAN 64.4 87.5 95.1 98.0 ASTPN+KeyPoint 64.5 87.5 96.1 98.1 本文方法 66.3 88.4 96.2 98.1 表 5 每个行人轨迹递归生成的图片张数$N$对PRID2011数据集上识别率的影响(%)
Table 5 The influence of the number $N$ of pictures generated recursively by each pedestrian trace on matching rate based on PRID2011 dataset (%)
$ N$ Rank 1 Rank 5 Rank 10 Rank 20 1 77.4 95.2 99.0 99.0 3 78.6 95.6 99.1 99.1 5 80.2 96.0 99.2 99.3 7 80.1 96.0 99.2 99.2 9 77.6 95.5 98.7 99.1 11 76.7 95.4 98.2 99.0 表 6 每个行人轨迹递归生成的图片张数$N$对iLIDS-VID数据集上识别率的影响(%)
Table 6 The influence of the number $N$ of pictures generated recursively by each pedestrian trace on matching rate based on iLIDS-VID dataset (%)
$N $ Rank 1 Rank 5 Rank 10 Rank 20 1 61.9 86.7 94.2 97.8 3 63.1 87.5 95.6 98.0 5 66.3 88.4 96.2 98.1 7 66.0 88.4 96.0 98.1 9 64.8 87.9 94.3 97.9 11 64.6 86.6 94.1 97.6 -
[1] Yi D, Lei Z, Liao S C, Li S Z. Deep metric learning for person re-identification. In: Proceedings of the 22nd International Conference on Pattern Recognition. Stockholm, Sweden: IEEE, 2014. 34-39 http://dl.acm.org/citation.cfm?id=2703838 [2] Varior R R, Shuai B, Lu J W, Xu D, Wang G. A Siamese long short-term memory architecture for human re-identification. In: Proceedings of the 14th European Conference on Computer Vision (ECCV). Amsterdam, The Netherlands: Springer, 2016. 135-153 doi: 10.1007/978-3-319-46478-7_9 [3] Liu H, Feng J S, Qi M B, Jiang J G, Yan S C. End-to-end comparative attention networks for person re-identification. IEEE Transactions on Image Processing, 2017, 26(7): 3492 -3506 doi: 10.1109/TIP.2017.2700762 [4] Wang T Q, Gong S G, Zhu X T, Wang S J. Person re-identification by video ranking. In: Proceedings of the 13th European Conference on Computer Vision (ECCV). Zurich, Switzerland: Springer, 2014. 688-703 [5] Li Y, Wu Z Y, Karanam S, Radke R J. Multi-shot human re-identification using adaptive fisher discriminant analysis. In: Proceedings of the 2015 British Machine Vision Conference (BMVC). Swansea, UK: BMVA Press, 2015. 73.1-73.12 [6] Zhu X K, Jing X Y, Wu F, Feng H. Video-based person re-identification by simultaneously learning intra-video and inter-video distance metrics. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI). New York, USA: ACM, 2016. 3552-3558 http://dl.acm.org/citation.cfm?id=3061053.3061117 [7] Liu K, Ma B P, Zhang W, Huang R. A spatio-temporal appearance representation for video-based pedestrian re-identification. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 3810-3818 http://www.researchgate.net/publication/304409873_A_Spatio-Temporal_Appearance_Representation_for_Viceo-Based_Pedestrian_Re-Identification [8] McLaughlin N, del Rincon J M, Miller P. Recurrent convolutional network for video-based person re-identification. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 1325-1334 http://ieeexplore.ieee.org/document/7780517/ [9] Yan Y C, Ni B B, Song Z C, Ma C, Yan Y, Yang X K. Person re-identification via recurrent feature aggregation. In: Proceedings of the 14th European Conference on Computer Vision (ECCV). Amsterdam, The Netherlands: Springer, 2016. 701-716 doi: 10.1007/978-3-319-46466-4_42.pdf [10] Liu H, Jie Z Q, Jayashree K, Qi M B, Jiang J G, Yan S C, et al. Video-based person re-identification with accumulative motion context. IEEE Transactions on Circuits and Systems for Video Technology, 2017, DOI: 10.1109/TCSVT.2017. 2715499 [11] Xu S J, Cheng Y, Gu K, Yang Y, Chang S Y, Zhou P. Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE, 2017. 4743-4752 http://ieeexplore.ieee.org/document/8237769/ [12] Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS). Montreal, Canada: ACM, 2014. 2672-2680 [13] Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. In: Proceedings of the 2016 International Conference on Learning Representations (ICLR). Caribe Hilton, San Juan, Puerto Rico, 2016 [14] Mirza M, Osindero S. Conditional generative adversarial nets. arXiv: 1411.1784, 2014. 2672-2680 [15] Denton E, Chintala S, Szlam A, Fergus R. Deep generative image models using a Laplacian pyramid of adversarial networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS). Montreal, Canada: ACM, 2015. 1486-1494 [16] Arjovsky M, Chintala S, Bottou L. Wasserstein GAN. arXiv: 1701.07875, 2017 [17] Agarwal A, Triggs B. 3D human pose from silhouettes by relevance vector regression. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). Washington, DC, USA: IEEE, 2004. Ⅱ-882-Ⅱ-888 [18] Mori G, Malik J. Estimating human body configurations using shape context matching. In: Proceedings of the 7th European Conference on Computer Vision (ECCV). Copenhagen, Denmark: Springer, 2002. 666-680 [19] Taylor G W, Fergus R, Williams G, Spiro I, Bregler C. Pose-sensitive embedding by nonlinear NCA regression. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems (NIPS). Vancouver, BC, Canada: ACM, 2010. 2280-2288 [20] Felzenszwalb P F, Huttenlocher D P. Pictorial structures for object recognition. International Journal of Computer Vision, 2005, 61(1): 55-79 http://d.old.wanfangdata.com.cn/OAPaper/oai_arXiv.org_1108.4079 [21] Jain A, Tompson J, Andriluka M, Taylor G, Bregler C. Learning human pose estimation features with convolutional networks. In: Proceedings of the 2014 ICLR. Banff, Canada, 2014. 1-14 [22] Pfister T, Charles J, Zisserman A. Flowing convnets for human pose estimation in videos. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 1913-1921 [23] Jing X Y, Zhu X K, Wu F, Hu R M, You X G, Wang Y H, et al. Super-resolution person re-identification with semi-coupled low-rank discriminant dictionary learning. IEEE Transactions on Image Processing, 2017, 26(3): 1363-1378 doi: 10.1109/TIP.2017.2651364 [24] Zheng Z D, Zheng L, Yang Y. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE, 2017. 3774-3782 [25] Qian X L, Fu Y W, Xiang T, Wang W X, Qiu J, Wu Y, et al. Pose-normalized image generation for person re-identification. arXiv: 1712.02225, 2018. [26] Deng W J, Zheng L, Ye Q X, Kang G L, Yang Y, Jiao J B. Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018. 994-1003 [27] Mathieu M, Couprie C, LeCun Y. Deep multi-scale video prediction beyond mean square error. In: Proceedings of the 4th International Conference on Learning Representations (ICLR). Caribe Hilton, San Juan, Argentina, 2016. [28] Wei S E, Ramakrishna V, Kanade T, Sheikh Y. Convolutional pose machines. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 4724-4732 [29] Ramakrishna V, Munoz D, Hebert M, Bagnell J A, Sheikh Y. Pose machines: articulated pose estimation via inference machines. In: Proceedings of the 13th European Conference on Computer Vision (ECCV). Zurich, Switzerland: Springer, 2014. 33-47 http://link.springer.com/openurl?id=doi:10.1007/978-3-319-10605-2_3 [30] Cao Z, Simon T, Wei S E, Sheikh Y. Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 1302-1310 http://www.researchgate.net/publication/310953055_Realtime_Multi-Person_2D_Pose_Estimation_using_Part_Affinity_Fields