2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于自注意力模态融合网络的跨模态行人再识别方法研究

杜鹏 宋永红 张鑫瑶

杜鹏, 宋永红, 张鑫瑶. 基于自注意力模态融合网络的跨模态行人再识别方法研究. 自动化学报, 2022, 48(6): 1457−1468 doi: 10.16383/j.aas.c190340
引用本文: 杜鹏, 宋永红, 张鑫瑶. 基于自注意力模态融合网络的跨模态行人再识别方法研究. 自动化学报, 2022, 48(6): 1457−1468 doi: 10.16383/j.aas.c190340
Du Peng, Song Yong-Hong, Zhang Xin-Yao. Self-attention cross-modality fusion network for cross-modality person re-identification. Acta Automatica Sinica, 2022, 48(6): 1457−1468 doi: 10.16383/j.aas.c190340
Citation: Du Peng, Song Yong-Hong, Zhang Xin-Yao. Self-attention cross-modality fusion network for cross-modality person re-identification. Acta Automatica Sinica, 2022, 48(6): 1457−1468 doi: 10.16383/j.aas.c190340

基于自注意力模态融合网络的跨模态行人再识别方法研究

doi: 10.16383/j.aas.c190340
基金项目: 国家重点研究发展计划 (2017YFB1301101), 陕西省自然科学基础研究计划 (2018JM6104)资助
详细信息
    作者简介:

    杜鹏:西安交通大学软件学院硕士研究生. 主要研究方向为行人再识别.E-mail: xjydupeng@163.com

    宋永红:西安交通大学人工智能学院研究员. 主要研究方向为图像与视频内容理解, 智能软件开发. 本文通信作者. E-mail: songyh@xjtu.edu.cn

    张鑫瑶:西安交通大学软件学院硕士研究生. 主要研究方向为行人再识别.E-mail: xyzhangxy@stu.xjtu.edu.cn

Self-attention Cross-modality Fusion Network for Cross-modality Person Re-identification

Funds: Supported by National Key Research and Development Program of China (2017YFB1301101) and Natural Science Basic Research Program of Shaanxi Province (2018JM6104)
More Information
    Author Bio:

    DU Peng Master student at the School of Software Engineering, Xi'an Jiaotong University. His main research interest is person re-identification

    SONG Yong-Hong Researcher at the College of Artificial Intelligence, Xi'an Jiaotong University. Her research interest covers image and video content understanding, and intelligent software development. Corresponding author of this paper

    ZHANG Xin-Yao Master student at the School of Software Engineering, Xi'an Jiaotong University. Her main research interest is person re-identification

  • 摘要: 行人再识别是实现多目标跨摄像头跟踪的核心技术, 该技术能够广泛应用于安防、智能视频监控、刑事侦查等领域. 一般的行人再识别问题面临的挑战包括摄像机的低分辨率、行人姿态变化、光照变化、行人检测误差、遮挡等. 跨模态行人再识别相比于一般的行人再识别问题增加了相同行人不同模态的变化. 针对跨模态行人再识别中存在的模态变化问题, 本文提出了一种自注意力模态融合网络. 首先是利用CycleGAN生成跨模态图像. 在得到了跨模态图像后利用跨模态学习网络同时学习两种模态图像特征, 对于原始数据集中的图像利用SoftMax 损失进行有监督的训练, 对生成的跨模态图像利用LSR (Label smooth regularization) 损失进行有监督的训练. 之后, 使用自注意力模块将原始图像和CycleGAN生成的图像进行区分, 自动地对跨模态学习网络的特征在通道层面进行筛选. 最后利用模态融合模块将两种筛选后的特征进行融合. 通过在跨模态数据集SYSU-MM01上的实验证明了本文提出的方法和跨模态行人再识别其他方法相比有一定程度的性能提升.
  • 图  1  行人再识别和多目标跨摄像头跟踪关系示意

    Fig.  1  The relationship between person re-identification and multi-target cross-camera tracking

    图  2  跨模态行人再识别数据

    Fig.  2  Data of cross-modality person re-identification

    图  3  自注意力模态融合网络

    Fig.  3  Self-attention cross-modality fusion network

    图  4  CycleGAN网络示意图

    Fig.  4  Structure of CycleGAN network

    图  5  利用CycleGAN生成的跨模态图像

    Fig.  5  Generated cross-modality images using CycleGAN

    图  6  包含较多噪声的跨模态转换后的图像

    Fig.  6  Generated cross-modality images with more noise

    图  7  自注意力模块示意图

    Fig.  7  Structure of self-attention model

    表  1  各模块在SYSU-MM01 All-search模式下的实验结果

    Table  1  Experimental results of each module in SYSU-MM01 dataset and All-search mode

    方法 All-search
    Single-shot Multi-shot
    Rank 1 Rank 10 Rank 20 mAP Rank 1 Rank 10 Rank 20 mAP
    Baseline 27.36 71.95 84.58 28.53 32.48 78.34 88.93 23.17
    跨模态学习 30.83 72.35 84.07 31.45 37.25 80.58 90.22 25.48
    跨模态 + 自注意力 31.3 73.34 84.78 31.72 37.98 81.76 91.05 25.39
    跨模态 + 模态融合 31.85 74.38 85.66 32.49 38.65 81.74 91.25 26.46
    自注意力模态融合 33.31 74.51 85.79 33.18 39.71 82 91.14 26.89
    下载: 导出CSV

    表  2  各模块在SYSU-MM01 Indoor-search模式下的实验结果

    Table  2  Experimental results of each module in SYSU-MM01 dataset and Indoor-search mode

    方法 Indoor-search
    Single-shot Multi-shot
    Rank 1 Rank 10 Rank 20 mAP Rank 1 Rank 10 Rank 20 mAP
    Baseline 32.17 81.3 92.26 42.76 38.95 85.29 93.62 33.73
    跨模态学习 37.21 80.81 90.29 47.06 43.98 86.01 93.37 37.09
    跨模态 + 自注意力 36.55 80.32 90.41 46.42 44.89 85.31 94.18 36.43
    跨模态 + 模态融合 37.63 81.75 91.48 47.73 44.82 87.26 94.97 38.07
    自注意力模态融合 38.09 81.68 90.61 47.86 45.8 86.72 93.86 37.95
    下载: 导出CSV

    表  3  加入各模块后的GFLOPs和参数量

    Table  3  GFLOPs and parameters after joining each module

    方法 GFLOPs GFLOPs相比于Baseline的变化 参数量 参数量相比于Baseline的变化
    Baseline 2.702772224 25 557 032
    跨模态学习 2.702772224 0 25 557 032 0
    跨模态 + 自注意力 2.7038208 0.001048576 26 609 960 +1 052 928 (4.12%)
    跨模态 + 模态融合 5.405544448 2.702772224 25 557 032 0
    自注意力模态融合 5.409639424 2.7068672 27 136 424 +1 579 392 (6.18%)
    下载: 导出CSV

    表  4  在SYSU-MM01 All-search模式下和跨模态行人再识别的对比实验

    Table  4  Comparative experiments between our method and others in SYSU-MM01 dataset and All-search mode

    方法 All-search
    Single-shot Multi-shot
    Rank 1 Rank 10 Rank 20 mAP Rank 1 Rank 10 Rank 20 mAP
    HOG + Euclidean 2.76 18.25 31.91 4.24 3.82 22.77 37.63 2.16
    Zero-padding 14.8 54.12 71.33 15.95 19.13 61.4 78.41 10.89
    BDTR 17.01 55.43 71.96 19.66
    cmGAN 26.97 67.51 80.56 27.8 31.49 72.74 85.01 22.27
    Baseline (本文方法) 27.36 71.95 84.58 28.53 32.48 78.34 88.93 23.17
    跨模态学习网络 (本文方法) 30.83 72.35 84.07 31.45 37.25 80.58 90.22 25.48
    自注意力模态融合 (本文方法) 33.31 74.51 85.79 33.18 39.71 82 91.14 26.89
    下载: 导出CSV

    表  5  在SYSU-MM01 Indoor-search模式下和跨模态行人再识别的对比实验

    Table  5  Comparative experiments between our method and others in SYSU-MM01 dataset and Indoor-search mode

    方法 Indoor-search
    Single-shot Multi-shot
    Rank 1 Rank 10 Rank 20 mAP Rank 1 Rank 10 Rank 20 mAP
    HOG + Euclidean 3.22 24.68 44.52 7.25 4.75 29.06 49.38 3.51
    Zero-padding 20.58 68.38 85.79 26.92 24.43 75.86 91.32 18.64
    cmGAN 31.63 77.23 89.18 42.19 37 80.94 92.11 32.76
    Baseline (本文方法) 32.17 81.3 92.26 42.76 38.95 85.29 93.62 33.73
    跨模态学习网络 (本文方法) 37.21 80.81 90.29 47.06 43.98 86.01 93.37 37.09
    自注意力模态融合 (本文方法) 38.09 81.68 90.61 47.86 45.8 86.72 93.86 37.95
    下载: 导出CSV
  • [1] 李幼蛟, 卓力, 张菁, 李嘉锋, 张辉. 行人再识别技术综述[J]. 自动 化学报, 2018, 44(9): 1554-1568

    LI You-Jiao, ZHUO Li, ZHANG Jing, LI Jia-Feng, ZHANG Hui. A Survey of Person Re-identiflcation. Acta Automatica Sinica, 2018, 44(9): 1554-1568
    [2] 吴彦丞, 陈鸿昶, 李邵梅, 高超. 基于行人属性先验分布的行人再识 别. 自动化学报, 2019, 45(5): 953-964

    Wu Yan-Cheng, Chen Hong-Chang, Li Shao-Mei, Gao Chao. Person re-identiflcation using attribute priori distribution. Acta Automatica Sinica, 2019, 45(5): 953-964
    [3] Zhang L, Ma B, Li G, Huang Q, Tian Q. Generalized semisupervised and structured subspace learning for cross-modal retrieval. IEEE Transactions on Multimedia, 2018, 20: 128-141 doi: 10.1109/TMM.2017.2723841
    [4] Zhang L, Ma B P, Li G R, Huang Q M, Tian Q. PL-ranking: A novel ranking method for cross-modal retrieval. In: Proceedings of the 24th ACM on Multimedia Conference. Amsterdam, the Netherlamds: ACM, 2016. 1355−1364
    [5] Krizhevsky A, Sutskever I, Hinton G. ImageNet classiflcation with deep convolutional neural networks. COMMUNICATIONS OF THE ACM, 2012, 60: 84-90
    [6] Gray D, Tao H. Viewpoint invariant pedestrian recognition with an ensemble of localized features. In: Proceedings of the 10th European Conference on Computer Vision, Marseille, France: Springer, 2008. 262−275
    [7] Yang Y, Yang J M, Yan J J, Liao S C, Yi D, Li S Z. Salient color names for person re-identiflcation. In: Proceedings of the 2014 Eruopeam Computer Vision. Zuich, Switzerland: Springer, 2014. Part I : 536−551
    [8] Köstinger M, Hirzer M, Wohlhart P, Roth P M, Bischof H. Large scale metric learning from equivalence constraints. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE, 2012. 2288−2295
    [9] Zheng Z, Zheng L, Yang Y. A discriminatively learned cnn embedding for person reidentiflcation.ACM Transactions on Multimedia Computing Communications and Applications, 2016, 14(1):13-1-13-20
    [10] Bromley J, Bentz J W, BOTTOU L, Guyon I, Lecun Y, Moore C, et al. Signature veriflcation using a “ siamese” time delay neural network. International Journal of Pattern Recognition and Artiflcial Intelligence, 1993, 7(4): 669-688 doi: 10.1142/S0218001493000339
    [11] Zhang X, Luo H, Fan X, Xiang W L, Sun Y X, Xiao Q Q, Jiang W, Zhang C, Sun J. AlignedReID: Surpassing humanlevel performance in person re-identiflcation. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE, 2017.
    [12] Zhao H Y, Tian M Q, Sun S Y, Shao J, Yan J J, Yi S, Wang X G, Tang X O. Spindle net: Person re-identiflcation with human body region guided feature decomposition and fusion. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017, 907−915
    [13] Dai Z Z, Chen M Q, Zhu S Y, Tan P. Batch feature erasing for person re-identiflcation and beyond. Computer Research Repository, 2018.
    [14] Zhong Z, Zheng L, Zheng Z D, Li S Z, Yang Y. Camera style adaptation for person re-identiflcation. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake, USA: IEEE, 2018. 5157−5166
    [15] Zhu J Y, Park T, Isola P, ECfros A A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 2242−2251
    [16] Wu A C, Zheng W S, Yu H X, Gong S G, Lai J H. RGB-infrared crossmodality person re-identiflcation. In: Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 5390−5399
    [17] Ye M, Wang Z, Lan X Y, Yuen P C. Visible thermal person re-identiflcation via dual-constrained top-ranking. In: Proceedings of the 27th International Joint Conference on Artiflcial Intelligence. Stockholm, Sweden: AAAI, 2018.1092−1099
    [18] Dai P Y, Ji R R, Wang H B, Wu Q, Huang Y Y. Cross-modality person re-identiflcation with generative adversarial training. In: Proceedings of the 27th International Joint Conference on Artiflcial Intelligence. Stockholm, Sweden: AAAI, 2018. 677−683
    [19] Lin J W, Li H. HPILN: A feature learning framework for crossmodality person re-identiflcation. [Online], available: https://arxiv.org/abs/1906.03142, August 14, 2019
    [20] Szegedy C, Vanhoucke V, Iofie S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 2818−2826
    [21] Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Advances in Neural Information Processing Systems, Berlin, Germany: Springer, 2014. 2672−2680
    [22] 林懿伦, 戴星原, 李力, 王晓, 王飞跃. 人工智能研究的新前线: 生 成式对抗网络. 自动化学报, 2018, 44(5): 775-792

    Lin Yi-Lun, Dai Xing-Yuan, Li Li, Wang Xiao, Wang FeiYue. The new frontier of AI research: generative adversarial networks. Acta Automatica Sinica, 2018, 44(5): 775-792
    [23] Mirza M P, Osindero S. Conditional generative adversarial nets. [Online], available: https://arxiv.org/abs/1411.1784, November 6, 2014
    [24] Isola P, Zhu J Y, Zhou T H, Efros A A. Image-to-image translation with conditional adversarial networks. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017. 5967−5976
    [25] Yi Z L, Zhang H, Tan P, Gong M L. DualGAN: unsupervised dual learning for image-to-image translation. In: Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 2868−2876
    [26] He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 770−778
    [27] Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018. 7132−7141
    [28] Nair V, Hinton G E. Rectifled linear units improve restricted Boltzmann machines vinod nair. In: Proceedings of the 27th International Conference on International Conference on Machine Learning. Haifa, Israel: Omnipress, 2010. 807−814
    [29] Yin X, Goudriaan J W, Lantinga E A, Vos J C, Spiertz H L. A flexible sigmoid function of determinate growth. Annals of botany, 2003, 91(3): 361-371.
    [30] Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z M, Desmaison A, Antiga L, Lerer A. Automatic differentiation in PyTorch. In: Proceedings of the 31st Conference and Workshop on Neural Information Processing Systems. California, USA: NIPS, 2017.
    [31] Reddi S J, Kale S, Kumar S. On the convergence of Adam and Beyond. In: Proceedings of the 6th International Conference on Learning Representations. Vancouver, BC, Canada: ICLR, 2018.
    [32] Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, CA, USA: IEEE, 2005. 886−893
    [33] Liao S C, Hu Y, Zhu X Y, Li S Z. Person re-identiflcation by local maximal occurrence representation and metric learning. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015. 2197−2206
  • 加载中
图(7) / 表(5)
计量
  • 文章访问数:  987
  • HTML全文浏览量:  579
  • PDF下载量:  390
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-05-07
  • 录用日期:  2019-10-11
  • 网络出版日期:  2022-04-22
  • 刊出日期:  2022-06-02

目录

    /

    返回文章
    返回