2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于深度学习的多目标跟踪关联模型设计

侯建华 张国帅 项俊

侯建华, 张国帅, 项俊.基于深度学习的多目标跟踪关联模型设计.自动化学报, 2020, 46(12): 2690−2700 doi: 10.16383/j.aas.c180528
引用本文: 侯建华, 张国帅, 项俊.基于深度学习的多目标跟踪关联模型设计.自动化学报, 2020, 46(12): 2690−2700 doi: 10.16383/j.aas.c180528
Hou Jian-Hua, Zhang Guo-Shuai, Xiang Jun. Designing affinity model for multiple object tracking based on deep learning. Acta Automatica Sinica, 2020, 46(12): 2690−2700 doi: 10.16383/j.aas.c180528
Citation: Hou Jian-Hua, Zhang Guo-Shuai, Xiang Jun. Designing affinity model for multiple object tracking based on deep learning. Acta Automatica Sinica, 2020, 46(12): 2690−2700 doi: 10.16383/j.aas.c180528

基于深度学习的多目标跟踪关联模型设计

doi: 10.16383/j.aas.c180528
基金项目: 

国家自然科学基金 61671484

国家自然科学基金 61701548

湖北省自然科学基金 2018CFB503

中南民族大学中央高校基本科研业务费专项资金项目 CZQ17001

中南民族大学中央高校基本科研业务费专项资金项目 CZZ18001

中南民族大学中央高校基本科研业务费专项资金项目 CZY18046

详细信息
    作者简介:

    侯建华 中南民族大学电子信息工程学院教授. 2007年获华中科技大学模式识别与智能系统博士学位.主要研究方向为计算机视觉与模式识别. E-mail: zil@scuec.edu.cn

    张国帅 中南民族大学电子信息工程学院硕士研究生. 2016年获长春大学学士学位.主要研究方向为图像处理与模式识别. E-mail: guoshuaiz@scuec.edu.cn

    通讯作者:

    项俊 中南民族大学电子信息工程学院讲师. 2016年获华中科技大学控制科学与工程博士学位.主要研究方向为计算机视觉与模式识别.本文通信作者. E-mail: junxiang@scuec.edu.cn

Designing Affinity Model for Multiple Object Tracking Based on Deep Learning

Funds: 

National Natural Science Foundation of China 61671484

National Natural Science Foundation of China 61701548

Hubei Provincial Natural Science Foundation of China 2018CFB503

Fundamental Research Funds for the Central Universities, South-Central University for Nationalities CZQ17001

Fundamental Research Funds for the Central Universities, South-Central University for Nationalities CZZ18001

Fundamental Research Funds for the Central Universities, South-Central University for Nationalities CZY18046

More Information
    Author Bio:

    HOU Jian-Hua  Professor at the College of Electronic Information Engineering, South-Central University for Nationalities. He received his Ph. D. degree in pattern recognition and intelligent system from Huazhong University of Science and Technology in 2007. His research interest covers computer vision and pattern recognition

    ZHANG Guo-Shuai  Master student at the College of Electronic Information Engineering, South-Central University for Nationalities. He received his bachelor degree from Changchun University in 2016. His research interest covers image processing and pattern recognition

    Corresponding author: XIANG Jun  Lecturer at the College of Electronic Information Engineering, South-Central University for Nationalities. She received her Ph. D. degree in control science and engineering from Huazhong University of Science and Technology in 2016. Her research interest covers computer vision and pattern recognition. Corresponding author of this paper
  • 摘要: 近年来, 深度学习在计算机视觉领域的应用取得了突破性进展, 但基于深度学习的视频多目标跟踪(Multiple object tracking, MOT)研究却相对甚少, 而鲁棒的关联模型设计是基于检测的多目标跟踪方法的核心.本文提出一种基于深度神经网络和度量学习的关联模型:采用行人再识别(Person re-identification, Re-ID)领域中广泛使用的度量学习技术和卷积神经网络(Convolutional neural networks, CNNs)设计目标外观模型, 即利用三元组损失函数设计一个三通道卷积神经网络, 提取更具判别性的外观特征构建目标外观相似度; 再结合运动模型计算轨迹片间的关联概率.在关联策略上, 采用匈牙利算法, 首先以逐帧关联方式得到短小可靠的轨迹片集合, 再通过自适应时间滑动窗机制多级关联, 输出各目标最终轨迹.在2DMOT2015、MOT16公开数据集上的实验结果证明了所提方法的有效性, 与当前一些主流算法相比较, 本文方法取得了相当或者领先的跟踪效果.
    Recommended by Associate Editor SANG Nong
    1)  本文责任编委  桑农
  • 图  1  多目标跟踪方法整体框架

    Fig.  1  The overall framework of multi-object tracking method

    图  2  三通道外观模型训练框图

    Fig.  2  Three-channel appearance model training block diagram

    图  3  自适应时间滑动窗原理示意图

    Fig.  3  Diagram of adaptive time sliding window principle

    图  4  多级关联中的运动模型示意图

    Fig.  4  Diagram of motion model in multi-level association

    图  5  MOT16-01跟踪结果(从左到右依次为第121、174、248帧)

    Fig.  5  Tracking results of MOT16-01 (121st, 174th, 248th frames from left to right)

    图  6  MOT16-03跟踪结果(从左到右依次为第249、307、424帧)

    Fig.  6  Tracking results of MOT16-03 (249th, 307th, 424th frames from left to right)

    图  7  MOT16-07跟踪结果(从左到右依次为第397、455、500帧)

    Fig.  7  Tracking results of MOT16-07 (397th, 455th, 500th frames from left to right)

    图  8  MOT16-06跟踪结果(从左到右依次为第537、806、1188帧)

    Fig.  8  Tracking results of MOT16-06 (537th, 806th, 1188th frames from left to right)

    图  9  MVI_20032跟踪结果(从左到右依次为第332、360、423帧)

    Fig.  9  Tracking results of MVI_20032 (332nd, 360th, 423rd frames from left to right)

    图  10  MVI_39771跟踪结果(从左到右依次为第1、54、113帧)

    Fig.  10  Tracking results of MVI_39771 (1st, 54th, 113th frames from left to right)

    表  1  剥离对比实验结果

    Table  1  Results of ablation study

    Trackers MOTA ($\uparrow$) MOTP ($\uparrow$) MT ($\uparrow$) (%) ML ($\downarrow$) (%) FP ($\downarrow$) FN ($\downarrow$) IDS ($\downarrow$)
    A + T 19.5 74.6 7.41 66.70 109 14 202 43
    M + T 17.6 74.6 7.40 64.80 307 14 326 70
    A + M + T 21.0 74.3 9.26 70.40 175 13 893 16
    A + M + V 14.7 75.1 1.85 67.00 60 14 804 339
    下载: 导出CSV

    表  2  MOT16测试集结果

    Table  2  Results of MOT16 test set

    Trackers Mode MOTA ($\uparrow$) MOTP ($\uparrow$) MT ($\uparrow$) (%) ML ($\downarrow$) (%) FP ($\downarrow$) FN ($\downarrow$) IDS ($\downarrow$) HZ ($\uparrow$)
    AMIR[20] Online 47.2 75.8 14.0 41.6 2 681 92 856 774 1.0
    CDA[45] Online 43.9 74.7 10.7 44.4 6 450 95 175 676 0.5
    本文 Online 43.1 74.2 12.4 47.7 4 228 99 057 495 0.7
    EAMTT[41] Online 38.8 75.1 7.9 49.1 8 114 102 452 965 11.8
    OVBT[42] Online 38.4 75.4 7.5 47.3 11 517 99 463 1 321 0.3
    [2mm] Quad-CNN[17] Batch 44.1 76.4 14.6 44.9 6 388 94 775 745 1.8
    LIN1[43] Batch 41.0 74.8 11.6 51.3 7 896 99 224 430 4.2
    CEM[44] Batch 33.2 75.8 7.8 54.4 6 837 114 322 642 0.3
    下载: 导出CSV

    表  3  2DMOT2015测试集结果

    Table  3  Results of 2DMOT2015 test set

    Trackers Mode MOTA ($\uparrow$) MOTP ($\uparrow$) MT ($\uparrow$) (%) ML ($\downarrow$) (%) FP ($\downarrow$) FN ($\downarrow$) IDS ($\downarrow$) HZ ($\uparrow$)
    AMIR[20] Online 37.6 71.7 15.8 26.8 7 933 29 397 1 026 1.9
    本文 Online 34.2 71.9 8.9 40.6 7965 31665 794 0.7
    CDA[45] Online 32.8 70.7 9.7 42.2 4 983 35 690 614 2.3
    RNN_LSTM[24] Online 19.0 71.0 5.5 45.6 11 578 36 706 1 490 165.2
    Quad-CNN[17] Batch 33.8 73.4 12.9 36.9 7 879 32 061 703 3.7
    MHT_DAM[46] Batch 32.4 71.8 16.0 43.8 9 064 32 060 435 0.7
    CNNTCM[22] Batch 29.6 71.8 11.2 44.0 7 786 34 733 712 1.7
    Siamese CNN[19] Batch 29.0 71.2 8.5 48.4 5 160 37 798 639 52.8
    LIN1[43] Batch 24.5 71.3 5.5 64.6 5 864 40 207 298 7.5
    下载: 导出CSV

    表  4  UA-DETRAC数据集跟踪结果

    Table  4  Tracking results of UA-DETRAC dataset

    MOTA ($\uparrow$) MOTP ($\uparrow$) MT ($\uparrow$) (%) ML ($\downarrow$) (%) FP ($\downarrow$) FN ($\downarrow$) IDS ($\downarrow$)
    车辆跟踪 65.3 78.5 75.0 8.3 1 069 481 27
    下载: 导出CSV
  • [1] Luo W H, Xing J L, Milan A, Zhang X Q, Liu W, Zhao X W, et al. Multiple object tracking: A literature review. arXiv preprint arXiv: 1409.7618, 2014.
    [2] Felzenszwalb P F, Girshick R B, McAllester D, Ramanan D. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1627-1645 doi: 10.1109/TPAMI.2009.167
    [3] Girshick R. Fast R-CNN. In: Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015. 1440-1448
    [4] 尹宏鹏, 陈波, 柴毅, 刘兆栋.基于视觉的目标检测与跟踪综述.自动化学报, 2016, 42(10): 1466-1489 doi: 10.16383/j.aas.2016.c150823

    Yin Hong-Peng, Chen Bo, Chai Yi, Liu Zhao-Dong. Vision-based object detection and tracking: A review. Acta Automatica Sinica, 2016, 42(10): 1466-1489 doi: 10.16383/j.aas.2016.c150823
    [5] Xiang J, Sang N, Hou J H, Huang R, Gao C X. Hough forest-based association framework with occlusion handling for multi-target tracking. IEEE Signal Processing Letters, 2016, 23(2): 257-261 doi: 10.1109/LSP.2015.2512878
    [6] Yang B, Nevatia R. Multi-target tracking by online learning of non-linear motion patterns and robust appearance models. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE, 2012. 1918-1925
    [7] Nummiaro K, Koller-Meier E, Van Gool L. An adaptive color-based particle filter. Image and Vision Computing, 2003, 21(1): 99-110 doi: 10.1016/S0262-8856(02)00129-4
    [8] Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE, 2005. 886-893
    [9] Tuzel O, Porikli F, Meer P. Region covariance: A fast descriptor for detection and classification. In: Proceedings of the 2006 European Conference on Computer Vision. Graz, Austria: Springer, 2006. 589-600
    [10] Xiang J, Sang N, Hou J H, Huang R, Gao C X. Multitarget tracking using Hough forest random field. IEEE Transactions on Circuits and Systems for Video Technology, 2016, 26(11): 2028-2042 doi: 10.1109/TCSVT.2015.2489438
    [11] Milan A, Leal-Taixé L, Reid I, Roth S, Schindler K. MOT16: A benchmark for multi-object tracking. arXiv preprint arXiv: 1603.00831, 2016.
    [12] Leal-Taixé L, Milan A, Reid I, Roth S, Schindler K. MOTChallenge 2015: Towards a benchmark for multi-target tracking. arXiv preprint arXiv: 1504.01942, 2015.
    [13] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA: MIT, 2012. 1097-1105
    [14] 管皓, 薛向阳, 安志勇.深度学习在视频目标跟踪中的应用进展与展望.自动化学报, 2016, 42(6): 834-847 doi: 10.16383/j.aas.2016.c150705

    Guan Hao, Xue Xiang-Yang, An Zhi-Yong. Advances on application of deep learning for video object tracking. Acta Automatica Sinica, 2016, 42(6): 834-847 doi: 10.16383/j.aas.2016.c150705
    [15] Bertinetto L, Valmadre J, Henriques J F, Vedaldi A, Torr P H S. Fully-convolutional siamese networks for object tracking. In: Proceedings of the 2016 European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016. 850-865
    [16] Danelljan M, Robinson A, Khan F S, Felsberg M. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: Proceedings of the 2016 European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016. 472-488
    [17] Son J, Baek M, Cho M, Han B. Multi-object tracking with quadruplet convolutional neural networks. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017. 3786-3795
    [18] Emami P, Pardalos P M, Elefteriadou L, Ranka S. Machine learning methods for solving assignment problems in multi-target tracking. arXiv preprint arXiv: 1802.06897, 2018.
    [19] Leal-Taixé L, Canton-Ferrer C, Schindler K. Learning by tracking: Siamese CNN for robust target association. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Las Vegas, USA: IEEE, 2016. 418-425
    [20] Sadeghian A, Alahi A, Savarese S. Tracking the untrackable: Learning to track multiple cues with long-term dependencies. In: Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 300- 311
    [21] Tang S Y, Andriluka M, Andres B, Schiele B. Multiple people tracking by lifted multicut and person re-identification. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017. 3701-3710
    [22] Wang B, Wang L, Shuai B, Zuo Z, Liu T, Chan K L, Wang G. Joint learning of convolutional neural networks and temporally constrained metrics for tracklet association. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Las Vegas, USA: IEEE, 2016. 368-393
    [23] Gold S, Rangarajan A. Softmax to softassign: Neural network algorithms for combinatorial optimization. Journal of Artificial Neural Networks, 1996, 2(4): 381-399 http://dl.acm.org/citation.cfm?id=235919
    [24] Milan A, Rezatofighi S H, Dick A, Schindler K, Reid I. Online multi-target tracking using recurrent neural networks. In: Proceedings of the 2017 AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI, 2017. 2-4
    [25] Beyer L, Breuers S, Kurin V, Leibe B. Towards a principled integration of multi-camera re-identification and tracking through optimal Bayes filters. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Honolulu, USA: IEEE, 2017. 1444-1453
    [26] Farazi H, Behnke S. Online visual robot tracking and identification using deep LSTM networks. In: Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vancouver, Canada: IEEE, 2017. 6118 -6125
    [27] Kuo C H, Nevatia R. How does person identity recognition help multi-person tracking? In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE, 2011. 1217-1224
    [28] Xiao Q Q, Luo H, Zhang C. Margin sample mining loss: A deep learning based method for person re-identification. arXiv preprint arXiv: 1710.00478, 2017.
    [29] Huang C, Wu B, Nevatia R. Robust object tracking by hierarchical association of detection responses. In: Proceedings of the 2008 European Conference on Computer Vision. Marseille, France: Springer, 2008. 788-801
    [30] Cheng D, Gong Y H, Zhou S P, Wang J J, Zheng N N. Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 1335-1344
    [31] He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 770-778
    [32] Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning. Lille, France: ACM, 2015. 448-456
    [33] Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. Fort Lauderdale, USA: JMLR, 2011. 315-323
    [34] Schroff F, Kalenichenko D, Philbin J. FaceNet: A unified embedding for face recognition and clustering. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015. 815-823
    [35] Zheng L, Shen L Y, Tian L, Wang S J, Wang J D, Tian Q. Scalable person re-identification: A benchmark. In: Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015. 1116-1124
    [36] Li W, Zhao R, Xiao T, Wang X G. DeepReID: Deep filter pairing neural network for person re-identification. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE, 2014. 152 -159
    [37] Hermans A, Beyer L, Leibe B. In defense of the triplet loss for person re-identification. arXiv preprint arXiv: 1703. 07737, 2017.
    [38] Kingma D P, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv: 1412.6980, 2014.
    [39] Yang B, Nevatia R. Multi-target tracking by online learning a CRF model of appearance and motion patterns. International Journal of Computer Vision, 2014, 107(2): 203-217 doi: 10.1007/s11263-013-0666-4
    [40] Bernardin K, Stiefelhagen R. Evaluating multiple object tracking performance: The CLEAR MOT metrics. EURASIP Journal on Image and Video Processing, 2008, 2008: 246309 http://dl.acm.org/citation.cfm?id=1453688
    [41] Sanchez-Matilla R, Poiesi F, Cavallaro A. Online multi-target tracking with strong and weak detections. In: Proceedings of the 2016 European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016. 84-99
    [42] Ban Y T, Ba S, Alameda-Pineda X, Horaud R. Tracking multiple persons based on a variational Bayesian model. In: Proceedings of the 2016 European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016. 52- 67
    [43] Fagot-Bouquet L, Audigier R, Dhome Y, Lerasle F. Improving multi-frame data association with sparse representations for robust near-online multi-object tracking. In: Proceedings of the 2016 European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016. 774-790
    [44] Milan A, Roth S, Schindler K. Continuous energy minimization for multitarget tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(1): 58-72 doi: 10.1109/TPAMI.2013.103
    [45] Bae S H, Yoon K J. Confidence-based data association and discriminative deep appearance learning for robust online multi-object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(3): 595-610 doi: 10.1109/TPAMI.2017.2691769
    [46] Kim C, Li F X, Ciptadi A, Rehg J M. Multiple hypothesis tracking revisited. In: Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015. 4696-4704
    [47] Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE, 2014. 580-587
  • 加载中
图(10) / 表(4)
计量
  • 文章访问数:  1505
  • HTML全文浏览量:  425
  • PDF下载量:  446
  • 被引次数: 0
出版历程
  • 收稿日期:  2018-08-02
  • 录用日期:  2019-01-09
  • 刊出日期:  2020-12-29

目录

    /

    返回文章
    返回