融合生成对抗网络和姿态估计的视频行人再识别方法

刘一敏; 蒋建国; 齐美彬; 刘皓; 周华捷

doi:10.16383/j.aas.c180054

融合生成对抗网络和姿态估计的视频行人再识别方法

doi: 10.16383/j.aas.c180054

刘一敏^1,,
蒋建国^{1, 2,},
齐美彬^{1, 2, ,},
刘皓^3,,
周华捷^1,

1.
合肥工业大学计算机与信息学院合肥 230009
2.
工业安全与应急技术安徽省重点实验室合肥 230009
3.
腾讯优图实验室合肥 230009

基金项目:

国家自然科学基金 61371155

国家自然科学基金 61771180

安徽省重点研究与开发项目 1704d0802183

详细信息

作者简介:
刘一敏  合肥工业大学计算机与信息学院硕士研究生.主要研究方向为计算机视觉, 图像处理, 行人再识别. E-mail: yiminliu@mail.hfut.edu.cn

蒋建国  合肥工业大学计算机与信息学院教授.主要研究方向为数字图像分析和处理, 分布式智能系统和数字信号处理技术及应用. E-mail: jgjiang@hfut.edu.cn

刘皓  腾讯优图实验室研究员. 2018年获得合肥工业大学博士学位.主要研究方向为计算机视觉, 行人再识别, 图像检索. E-mail: hfut.haoliu@gmail.com

周华捷  合肥工业大学计算机与信息学院硕士研究生.主要研究方向为计算机视觉, 图像处理, 行人再识别. E-mail: Zhou hj@mail.hfut.edu.cn

通讯作者:
齐美彬合肥工业大学计算机与信息学院教授.主要研究方向为视频编码, 运动目标检测与跟踪和DSP技术.本文通信作者. E-mail: qimeibin@163.com

计量
- 文章访问数: 2504
- HTML全文浏览量: 489
- PDF下载量: 345
- 被引次数: 0
出版历程
- 收稿日期: 2018-01-22
- 录用日期: 2018-07-02
- 刊出日期: 2020-03-30

Video-based Person Re-identification Method Based on GAN and Pose Estimation

LIU Yi-Min^1
,,
JIANG Jian-Guo^{1, 2
,},
QI Mei-Bin^{1, 2
, ,},
LIU Hao^3
,,
ZHOU Hua-Jie^1
,

1.
School of Computer and Information, Hefei University of Technology, Hefei 230009
2.
Anhui Province Key Laboratory of Industry Safety and Emergency Technology, Hefei 230009
3.
YouTu Laboratory, Tencent, Hefei 230009

Funds:

National Natural Science Foundation of China 61371155

National Natural Science Foundation of China 61771180

Anhui Province Key Research and Development Projects 1704d0802183

More Information

Author Bio:
LIU Yi-Min Master student at the School of Computer and Information, Hefei University of Technology. His research interest covers computer vision, image processing, and person re-identiflcation

JIANG Jian-Guo Professor at the School of Computer and Information, Hefei University of Technology. His research interest covers digital image analysis and processing, distributed intelligent systems, digital signal processing DSP) technology, and applications

LIU HAO Researcher of Tencent YouTu Laboratory, He received his Ph. D. degree from Hefei University of Technology in 2018. His research interest covers computer vision, person re-identiflcation, and image retrieval

ZHOU Hua-Jie Master student at the School of Computer and Information, Hefei University of Technology. His research interest covers computer vision, image processing, and person re-identiflcation

Corresponding author: QI Mei-Bin Professor at the School of Computer and Information, Hefei University of Technology. His research interest covers video coding, moving target detection and tracking, and DSP technology. Corresponding author of this paper

摘要

摘要: 随着国家对社会公共安全的日益重视, 无重叠视域监控系统已大规模的普及.行人再识别任务通过匹配不同视域摄像机下的行人目标, 在当今环境下显得尤为重要.由于深度学习依赖大数据解决过拟合的特性, 针对当前视频行人再识别数据量较小和学习特征单一的问题, 我们提出了一种基于视频的改进行人再识别方法, 该方法通过生成对抗网络去生成视频帧序列来增加样本数量和加入了行人关节点的特征信息去提升模型效率.实验结果表明, 本文提出的改进方法可以有效地提高公开数据集的识别率, 在PRID2011, iLIDS-VID数据集上进行实验, Rank 1分别达到了80.2%和66.3 %.
- 行人再识别 /
- 深度学习 /
- 生成对抗网络 /
- 人体姿态估计
Abstract: As the government keeps attaching importance to public security, non-overlapping viewsheds surveillance systems have been deployed widely. It has become especially important to recognize pedestrian target through matching cameras with different viewsheds in nowadays. Deep learning relies on big data to solve overfitting. However, the current video-based person re-identification only has small data volume and homogeneous learning features. To solve this, we put forward a method to improve person re-identification based on the video. This method can increase the sample quantity by generating video frame sequence through generative adversarial network. It also adds the feature information of the pedestrian joints, which can improve the model efficiency. The experiment result shows that the modified method discussed in this paper can improve the recognition rate of public datasets effectively. In the experiments on PRID2011 and iLIDS-VID, Rank 1 attained 80.2 % and 66.3 %, respectively.
- Person re-identiflcation /
- deep learning /
- generative adversarial network(GAN) /
- human pose estimation
Recommended by Associate Editor LIU Qing-Shan
注释:

1) 本文责任编委刘青山

HTML全文

图 1 多尺度结构

Fig. 1 Multi-scale architecture

下载: 全尺寸图片幻灯片

图 2 生成对抗网络生成的视频帧序列(后5帧)

Fig. 2 A sequence of video frames generated by GAN (last five frames)

下载: 全尺寸图片幻灯片

图 3 CPM算法的网络结构

Fig. 3 Structure of CPM algorithm

下载: 全尺寸图片幻灯片

图 4 CPM算法检测到的行人关节点特征

Fig. 4 Pedestrian keypoint features detected by CPM algorithm

下载: 全尺寸图片幻灯片

图 5 融合生成对抗网络和姿态估计算法网络结构

Fig. 5 The structure of integration of GAN and pose estimation algorithm

下载: 全尺寸图片幻灯片

表 1 不同算法在PRID2011数据集上的识别率(%)

Table 1 Matching rates of different methods on the PRID2011 dataset (%)

方法	Rank 1	Rank 5	Rank 10	Rank 20
AFDA^[5]	43.0	72.7	84.6	91.9
VR^[4]	41.8	64.5	77.5	89.4
STA^[7]	64.1	87.3	89.9	92.0
RFA^[9]	64.1	85.8	93.7	98.4
RNN-CNN^[8]	70.0	90.0	95.0	97.0
ASTPN^[11]	77.0	95.0	99.0	99.0
本文方法	80.2	96.0	99.1	99.2

下载: 导出CSV

表 2 不同算法在PRID2011数据集上对识别率的影响(%)

Table 2 The influence of different methods on matching rates based on PRID2011 dataset (%)

方法	Rank 1	Rank 5	Rank 10	Rank 20
ASTPN	77.0	95.0	99.0	99.0
ASTPN+GAN	79.2	95.3	99.2	99.2
ASTPN+KeyPoint	78.6	95.1	99.1	99.1
本文方法	80.2	96.0	99.1	99.2

下载: 导出CSV

表 3 不同算法在iLIDS-VID数据集上的识别率(%)

Table 3 Matching rates of different methods on the iLIDS-VID dataset (%)

方法	Rank 1	Rank 5	Rank 10	Rank 20
AFDA^[5]	37.5	62.7	73.0	81.8
VR^[4]	34.5	56.7	67.5	77.5
STA^[7]	44.3	71.7	83.7	91.7
RFA^[9]	49.3	76.8	85.3	90.1
RNN-CNN ^[8]	58.0	84.0	91.0	96.0
ASTPN^[11]	62.0	86.0	94.0	98.0
本文方法	66.3	88.4	96.2	98.1

下载: 导出CSV

表 4 不同算法在iLIDS-VID数据集上对识别率的影响(%)

Table 4 The influence of different methods on matching rates based on iLIDS-VID dataset (%)

方法	Rank 1	Rank 5	Rank 10	Rank 20
ASTPN	62.0	86.0	94.0	98.0
ASTPN+GAN	64.4	87.5	95.1	98.0
ASTPN+KeyPoint	64.5	87.5	96.1	98.1
本文方法	66.3	88.4	96.2	98.1

下载: 导出CSV

表 5 每个行人轨迹递归生成的图片张数$N$对PRID2011数据集上识别率的影响(%)

Table 5 The influence of the number $N$ of pictures generated recursively by each pedestrian trace on matching rate based on PRID2011 dataset (%)

$ N$	Rank 1	Rank 5	Rank 10	Rank 20
1	77.4	95.2	99.0	99.0
3	78.6	95.6	99.1	99.1
5	80.2	96.0	99.2	99.3
7	80.1	96.0	99.2	99.2
9	77.6	95.5	98.7	99.1
11	76.7	95.4	98.2	99.0

下载: 导出CSV

表 6 每个行人轨迹递归生成的图片张数$N$对iLIDS-VID数据集上识别率的影响(%)

Table 6 The influence of the number $N$ of pictures generated recursively by each pedestrian trace on matching rate based on iLIDS-VID dataset (%)

$N $	Rank 1	Rank 5	Rank 10	Rank 20
1	61.9	86.7	94.2	97.8
3	63.1	87.5	95.6	98.0
5	66.3	88.4	96.2	98.1
7	66.0	88.4	96.0	98.1
9	64.8	87.9	94.3	97.9
11	64.6	86.6	94.1	97.6

下载: 导出CSV

参考文献(30)

[1]	Yi D, Lei Z, Liao S C, Li S Z. Deep metric learning for person re-identification. In: Proceedings of the 22nd International Conference on Pattern Recognition. Stockholm, Sweden: IEEE, 2014. 34-39 http://dl.acm.org/citation.cfm?id=2703838
[2]	Varior R R, Shuai B, Lu J W, Xu D, Wang G. A Siamese long short-term memory architecture for human re-identification. In: Proceedings of the 14th European Conference on Computer Vision (ECCV). Amsterdam, The Netherlands: Springer, 2016. 135-153 doi: 10.1007/978-3-319-46478-7_9
[3]	Liu H, Feng J S, Qi M B, Jiang J G, Yan S C. End-to-end comparative attention networks for person re-identification. IEEE Transactions on Image Processing, 2017, 26(7): 3492 -3506 doi: 10.1109/TIP.2017.2700762
[4]	Wang T Q, Gong S G, Zhu X T, Wang S J. Person re-identification by video ranking. In: Proceedings of the 13th European Conference on Computer Vision (ECCV). Zurich, Switzerland: Springer, 2014. 688-703
[5]	Li Y, Wu Z Y, Karanam S, Radke R J. Multi-shot human re-identification using adaptive fisher discriminant analysis. In: Proceedings of the 2015 British Machine Vision Conference (BMVC). Swansea, UK: BMVA Press, 2015. 73.1-73.12
[6]	Zhu X K, Jing X Y, Wu F, Feng H. Video-based person re-identification by simultaneously learning intra-video and inter-video distance metrics. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI). New York, USA: ACM, 2016. 3552-3558 http://dl.acm.org/citation.cfm?id=3061053.3061117
[7]	Liu K, Ma B P, Zhang W, Huang R. A spatio-temporal appearance representation for video-based pedestrian re-identification. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 3810-3818 http://www.researchgate.net/publication/304409873_A_Spatio-Temporal_Appearance_Representation_for_Viceo-Based_Pedestrian_Re-Identification
[8]	McLaughlin N, del Rincon J M, Miller P. Recurrent convolutional network for video-based person re-identification. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 1325-1334 http://ieeexplore.ieee.org/document/7780517/
[9]	Yan Y C, Ni B B, Song Z C, Ma C, Yan Y, Yang X K. Person re-identification via recurrent feature aggregation. In: Proceedings of the 14th European Conference on Computer Vision (ECCV). Amsterdam, The Netherlands: Springer, 2016. 701-716 doi: 10.1007/978-3-319-46466-4_42.pdf
[10]	Liu H, Jie Z Q, Jayashree K, Qi M B, Jiang J G, Yan S C, et al. Video-based person re-identification with accumulative motion context. IEEE Transactions on Circuits and Systems for Video Technology, 2017, DOI: 10.1109/TCSVT.2017. 2715499
[11]	Xu S J, Cheng Y, Gu K, Yang Y, Chang S Y, Zhou P. Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE, 2017. 4743-4752 http://ieeexplore.ieee.org/document/8237769/
[12]	Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS). Montreal, Canada: ACM, 2014. 2672-2680
[13]	Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. In: Proceedings of the 2016 International Conference on Learning Representations (ICLR). Caribe Hilton, San Juan, Puerto Rico, 2016
[14]	Mirza M, Osindero S. Conditional generative adversarial nets. arXiv: 1411.1784, 2014. 2672-2680
[15]	Denton E, Chintala S, Szlam A, Fergus R. Deep generative image models using a Laplacian pyramid of adversarial networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS). Montreal, Canada: ACM, 2015. 1486-1494
[16]	Arjovsky M, Chintala S, Bottou L. Wasserstein GAN. arXiv: 1701.07875, 2017
[17]	Agarwal A, Triggs B. 3D human pose from silhouettes by relevance vector regression. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). Washington, DC, USA: IEEE, 2004. Ⅱ-882-Ⅱ-888
[18]	Mori G, Malik J. Estimating human body configurations using shape context matching. In: Proceedings of the 7th European Conference on Computer Vision (ECCV). Copenhagen, Denmark: Springer, 2002. 666-680
[19]	Taylor G W, Fergus R, Williams G, Spiro I, Bregler C. Pose-sensitive embedding by nonlinear NCA regression. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems (NIPS). Vancouver, BC, Canada: ACM, 2010. 2280-2288
[20]	Felzenszwalb P F, Huttenlocher D P. Pictorial structures for object recognition. International Journal of Computer Vision, 2005, 61(1): 55-79 http://d.old.wanfangdata.com.cn/OAPaper/oai_arXiv.org_1108.4079
[21]	Jain A, Tompson J, Andriluka M, Taylor G, Bregler C. Learning human pose estimation features with convolutional networks. In: Proceedings of the 2014 ICLR. Banff, Canada, 2014. 1-14
[22]	Pfister T, Charles J, Zisserman A. Flowing convnets for human pose estimation in videos. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 1913-1921
[23]	Jing X Y, Zhu X K, Wu F, Hu R M, You X G, Wang Y H, et al. Super-resolution person re-identification with semi-coupled low-rank discriminant dictionary learning. IEEE Transactions on Image Processing, 2017, 26(3): 1363-1378 doi: 10.1109/TIP.2017.2651364
[24]	Zheng Z D, Zheng L, Yang Y. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE, 2017. 3774-3782
[25]	Qian X L, Fu Y W, Xiang T, Wang W X, Qiu J, Wu Y, et al. Pose-normalized image generation for person re-identification. arXiv: 1712.02225, 2018.
[26]	Deng W J, Zheng L, Ye Q X, Kang G L, Yang Y, Jiao J B. Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018. 994-1003
[27]	Mathieu M, Couprie C, LeCun Y. Deep multi-scale video prediction beyond mean square error. In: Proceedings of the 4th International Conference on Learning Representations (ICLR). Caribe Hilton, San Juan, Argentina, 2016.
[28]	Wei S E, Ramakrishna V, Kanade T, Sheikh Y. Convolutional pose machines. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 4724-4732
[29]	Ramakrishna V, Munoz D, Hebert M, Bagnell J A, Sheikh Y. Pose machines: articulated pose estimation via inference machines. In: Proceedings of the 13th European Conference on Computer Vision (ECCV). Zurich, Switzerland: Springer, 2014. 33-47 http://link.springer.com/openurl?id=doi:10.1007/978-3-319-10605-2_3
[30]	Cao Z, Simon T, Wei S E, Sheikh Y. Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 1302-1310 http://www.researchgate.net/publication/310953055_Realtime_Multi-Person_2D_Pose_Estimation_using_Part_Affinity_Fields

施引文献

资源附件(0)

访问统计

图(5) / 表(6)

计量

文章访问数: 2504
HTML全文浏览量: 489
PDF下载量: 345
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

融合生成对抗网络和姿态估计的视频行人再识别方法

doi: 10.16383/j.aas.c180054

通讯作者:
齐美彬合肥工业大学计算机与信息学院教授.主要研究方向为视频编码, 运动目标检测与跟踪和DSP技术.本文通信作者. E-mail: qimeibin@163.com

计量

Video-based Person Re-identification Method Based on GAN and Pose Estimation

Corresponding author: QI Mei-Bin Professor at the School of Computer and Information, Hefei University of Technology. His research interest covers video coding, moving target detection and tracking, and DSP technology. Corresponding author of this paper

计量

目录

留言板

融合生成对抗网络和姿态估计的视频行人再识别方法

doi: 10.16383/j.aas.c180054

通讯作者: 齐美彬 合肥工业大学计算机与信息学院教授.主要研究方向为视频编码, 运动目标检测与跟踪和DSP技术.本文通信作者. E-mail: qimeibin@163.com

计量

出版历程

Video-based Person Re-identification Method Based on GAN and Pose Estimation

Corresponding author: QI Mei-Bin Professor at the School of Computer and Information, Hefei University of Technology. His research interest covers video coding, moving target detection and tracking, and DSP technology. Corresponding author of this paper

计量

出版历程

目录

通讯作者:
齐美彬合肥工业大学计算机与信息学院教授.主要研究方向为视频编码, 运动目标检测与跟踪和DSP技术.本文通信作者. E-mail: qimeibin@163.com