2.624

2020影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于运动引导的高效无监督视频目标分割网络

赵子成 张开华 樊佳庆 刘青山

赵子成, 张开华, 樊佳庆, 刘青山. 基于运动引导的高效无监督视频目标分割网络. 自动化学报, 2021, 48(x): 1001−1009 doi: 10.16383/j.aas.c210626
引用本文: 赵子成, 张开华, 樊佳庆, 刘青山. 基于运动引导的高效无监督视频目标分割网络. 自动化学报, 2021, 48(x): 1001−1009 doi: 10.16383/j.aas.c210626
Zhao Zi-Cheng, Zhang Kai-Hua, Fan Jia-Qing, Liu Qing-Shan. Learning motion guidance for efficient unsupervised video object segmentation. Acta Automatica Sinica, 2021, 48(x): 1001−1009 doi: 10.16383/j.aas.c210626
Citation: Zhao Zi-Cheng, Zhang Kai-Hua, Fan Jia-Qing, Liu Qing-Shan. Learning motion guidance for efficient unsupervised video object segmentation. Acta Automatica Sinica, 2021, 48(x): 1001−1009 doi: 10.16383/j.aas.c210626

基于运动引导的高效无监督视频目标分割网络

doi: 10.16383/j.aas.c210626
基金项目: 科技创新2030-“新一代人工智能”重大项目(2018AAA0100400);国家自然科学基金项目(61876088,U20B2065,61532009);江苏省333工程人才项目(BRA2020291)
详细信息
    作者简介:

    赵子成:南京信息工程大学自动化学院硕士研究生.主要研究方向为视频目标分割,深度学习. E-mail: 20191222013@nuist.edu.cn

    张开华:南京信息工程大学自动化学院教授.主要研究方向为视频目标分割,视觉追踪. 本文通讯作者. E-mail: zhkhua@gmail.com

    樊佳庆:南京航空航天大学博士研究生.研究方向为视频目标分割. E-mail: jqfan@nuaa.edu.cn

    刘青山:南京信息工程大学自动化学院教授.主要研究方向为视频内容分析与理解. E-mail: qsliu@nuist.edu.cn

Learning Motion Guidance for Efficient Unsupervised Video Object Segmentation

Funds: Supported by National Key Research and Development Program of China under Grant No. 2018AAA0100400, National Natural Science Foundation of China (61876088, U20B2065, 61532009); 333 High-level Talents Cultivation of Jiangsu Province (BRA2020291).
More Information
    Author Bio:

    ZHAO Zi-Cheng Master student at the School of Automation, Nanjing University of information science and technology. His research interest covers video object segmentation and deep learning

    Zhang Kai-Hua Professor at the School of Automation, Nanjing University of information science and technology. His research interest covers video object segmentation and visual tracking

    FAN Jia-Qing PhD candidate with the College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China. His main reserch interest is video object segmentation

    LIU Qing-Shan Professor at the School of Automation, Nanjing University of information science and technology. His research interest covers video content analysis and understanding

  • 摘要: 大量基于深度学习的无监督视频目标分割算法存在模型参数量与计算量较大的问题, 这显著地限制了算法在实际中的应用. 本文提出了基于运动引导的视频目标分割网络, 在大幅降低模型参数量与计算量的同时提升视频目标分割性能.整个模型由双流网络、运动引导模块、多尺度渐进融合模块三部分组成. 具体地,RGB图像与光流估计输入双流网络提取物体外观特征与运动特征. 然后,运动引导模块通过局部注意力提取运动特征中的语义信息,用于引导外观特征学习丰富的语义信息.最后,多尺度渐进融合模块获取双流网络的各个阶段输出的特征,将深层特征渐进地融入浅层特征, 最终提升边缘分割效果. 本文在三个标准数据集上进行了大量评测, 实验结果证明了本文方法的优越性能.
  • 图  1  网络框架图

    Fig.  1  Figure of network structure

    图  2  注意力结构

    Fig.  2  Attention structure

    图  3  UNet风格上采样与多尺度渐进融合模块

    Fig.  3  UNet style-like upsample module and multi-scale progressive fusion module

    图  4  分割结果对比展示

    Fig.  4  Comparative display of segmentation results

    图  5  分割结果展示

    Fig.  5  Display of segmentation results

    表  1  不同模块计算量对比

    Table  1  Comparison of FLOPs of different modules

    输入尺寸互注意模块运动引导模块
    $64 \times 64 \times 16$10.0M2.3M
    $64 \times 32 \times 32$153.1M9.0M
    下载: 导出CSV

    表  2  不同方法在DAVIS 16 和FBMS数据集的评估结果

    Table  2  Evaluation results of different methods on DAVIS 16 and FBMS dataset

    DAVIS-16FBMS
    Method$J\&F$$J$$F$$J$
    LMP[18]68. 070. 065. 9
    LVO[17]74. 075. 972. 1
    PDB[14]75. 977. 074. 574. 0
    MBNM[19]79. 580. 478. 573. 9
    AGS[20]78. 679. 777. 4
    COSNet[10]80. 080. 579. 475. 6
    AGNN[8]79. 980. 779. 1
    AnDiff[21]81. 181. 780. 5
    MATNet[15]81. 682. 480. 776. 1
    Ours83. 683. 783. 475. 9
    下载: 导出CSV

    表  3  不同方法在DAVIS 16、FBMS和ViSal数据集的评估结果

    Table  3  Evaluation results of different methods on DAVIS 16、FBMS and ViSal dataset

    DAVIS-16FBMSViSal
    MethodMAE${F_m}$MAE${F_m}$MAE${F_m}$
    FCNS[23].05372. 9.10073. 5.04187. 7
    FGRNE[24].04378. 6.08377. 9.04085. 0
    TENET[22].01990. 4.02689. 7.01494. 9
    MBNM[19].03186. 2.04781. 6.047
    PDB[14].03084. 9.06981. 5.02291. 7
    AnDiff[21].04480. 8.06481. 2.03090. 4
    Ours.01492. 4.05984. 2.01992. 1
    下载: 导出CSV

    表  4  不同方法的模型参数量、计算量与推理时延

    Table  4  Model parameters, FLOPs and infer latency of different methods

    MethodCOSNet[9]MATNet[15]Ours
    Res.$473 \times 473$$473 \times 473$$384 \times 672$
    #Param(M)81.2142.76.4
    #FLOPs(G)585.5193.75.4
    Latency (ms)657815
    下载: 导出CSV

    表  5  不同方法在GTX 2080 Ti上的性能表现

    Table  5  Performance of different methods on GTX 2080 Ti

    ConcurrencyFPSLatency
    MATNet[15]181662. 4ms
    本文算法1301616. 21ms
    下载: 导出CSV

    表  6  消融实验. FG 与 U 分别代表运动引导模块与多尺度渐进融合模块

    Table  6  Ablation experiments. FG and U represent motion guidance module and multi-scale progressive fusion module respectively

    本文算法$ - FG - U$$ - FG$
    $J$83.775.876.1
    $F$83.473.575.6
    下载: 导出CSV

    表  7  不同K大小与堆叠次数对比

    Table  7  Comparison of different kernel sizes and cascading times

    Method本文算法
    KernelCascade$J$$F$
    3182. 882. 4
    3283. 482. 7
    3383. 783. 4
    3483. 583. 2
    5183. 282. 6
    7183. 482. 7
    9183. 182. 4
    下载: 导出CSV
  • [1] Papazoglou A, Ferrari V. Fast object segmentation in unconstrained video. In: Proceedings of the 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE, 2013. 1777−1784
    [2] Wang W, Shen J, Porikli F. Saliency-aware geodesic video object segmentation. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015. 3395−3402
    [3] 黄宏图, 毕笃彦, 侯志强, 胡长城, 高山, 查宇飞, 库涛. 基于稀疏表示的视频目标跟踪研究综述[J]. 自动化学报, 2018, 44(10): 1747-1763

    HUANG Hong-Tu, BI Du-Yan, HOU Zhi-Qiang, HU Chang-Cheng, GAO Shan, ZHA Yu-Fei, KU Tao. Research of Sparse Representation-based Visual Object Tracking: A Survey. Acta Automatica Sinica, 2018, 44(10): 1747-1763
    [4] 苏亮亮, 唐俊, 梁栋, 王年. 基于最大化子模和RRWM的视频协同分割[J]. 自动化学报, 2016, 42(10): 1532-1541

    SU Liang-Liang, TANG Jun, LIANG Dong, WANG Nian. A Video Co-segmentation Algorithm by Means of Maximizing Submodular Function and RRWM. Acta Automatica Sinica, 2016, 42(10): 1532-1541
    [5] 钱生, 陈宗海, 林名强, 张陈斌. 基于条件随机场和图像分割的显著性检测[J]. 自动化学报, 2015, 41(4): 711-724

    QIAN Sheng, CHEN Zong-Hai, LIN Ming-Qiang, ZHANG Chen-Bin. Saliency Detection Based on Conditional Random Field and Image Segmentation. Acta Automatica Sinica, 2015, 41(4): 711-724.
    [6] Ochs P, Malik J, Brox T. Segmentation of moving objects by long term video analysis. In: IEEE transactions on pattern analysis and machine intelligence, 2013, 36(6): 1187−1200
    [7] Ochs P, Brox T. Object segmentation in video: a hierarchical variational approach for turning point trajectories into dense regions. In: Proceedings of the 2011 IEEE International Conference on Computer Vision. Barcelona, Spain: IEEE, 2011. 1583−1590
    [8] Ventura C, Bellver M, Girbau A, Salvador A, Marques F, and Giroinieto X. Rvos: end-to-end recurrent network for video object segmentation. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 5277−5286
    [9] Wang W, Lu X, Shen J, Crandall D J, Shao L. Zero-shot video object segmentation via attentive graph neural networks. In: Proceedings of the 2019 IEEE International Conference on Computer Vision. Seoul, Korea: IEEE, 2019. 9236−9245
    [10] Lu X, Wang W, Ma C, Shen J, Shao L, Porikli F. See more, know more: unsupervised video object segmentation with co-attention siamese networks. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 3623−3632
    [11] Faktor A, Irani M. Video Segmentation by Non-Local Consensus voting. In: Proceedings of the 2014 British Machine Vision Conference. Nottingham, UK, 2014, 2(7): 8
    [12] Perazzi F, Pont-Tuset J, McWilliams B, Van-Gool L, Gross M, Sorkine-Hornung A. A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA, IEEE, 2016. 724−732
    [13] Xu N, Yang L J, Fan Y C, Yang J C, Yue D C, Liang Y C, Cohen S, Huang T. Youtube-vos: sequence-to-sequence video object segmentation. In: Proceedings of the 2018 European Conference on Computer Vision. Munich, Germany: Spring, 2018. 585−601
    [14] Song H, Wang W, Zhao S, Shen J, Lam K M. Pyramid dilated deeper convlstm for video salient object detection. In: Proceedings of the 2018 European Conference on Computer Vision. Munich, Germany: Spring, 2018. 715−731
    [15] Zhou T, Li J, Wang S, Tao R, Shen J. Matnet: motion-attentive transition network for zero-shot video object segmentation. In: IEEE Transactions on Image Processing, 2020, 29: 8326−8338
    [16] Jampani V, Gadde R, Gehler P V. Video propagation networks. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE, 2017. 451−461
    [17] Tokmakov P, Alahari K, Schmid C. Learning video object segmentation with visual memory. In: Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 4481−4490
    [18] Tokmakov P, Alahari K, Schmid C. Learning motion patterns in videos. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE, 2017. 3386−3394
    [19] Li S, Seybold B, Vorobyov A, Lei X, Kuo C C J. Unsupervised video object segmentation with motion-based bilateral networks. In: Proceedings of the 2018 European Conference on Computer Vision. Munich, Germany: Spring, 2018. 207−223
    [20] Wang W, Song H, Zhao S, Shen J, Zhao S, Hoi S C, Ling H. Learning unsupervised video object segmentation through visual attention. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 3064−3074
    [21] Yang Z, Wang Q, Bertinetto L, Hu W, Bai S, Torr P. H. Anchor diffusion for unsupervised video object segmentation. In: Proceedings of the 2019 IEEE International Conference on Computer Vision. Seoul, Korea: IEEE, 2019. 931−940.
    [22] Ren S, Han C, Yang X, Han G, He S. TENet: triple excitation network for video salient object detection. In: Proceedings of the 2020 European Conference on Computer Vision. Edinburgh, Scotland, 2020. 212−228
    [23] Wang W, Shen J, Shao L. Video salient object detection via fully convolutional networks. In: IEEE Transactions on Image Processing, 2017, 27(1): 38−49
    [24] Li G, Xie Y, Wei T, Wang K, Lin L. Flow guided recurrent neural encoder for video salient object detection. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 3243−3252
    [25] Wang W, Shen J, Shao L. Consistent video saliency using local gradient flow optimization and global refinement. In: IEEE Transactions on Image Processing, 2015, 24(11): 4185−4196
    [26] Chen L C, Papandreou G, Schroff F, Adam H. Rethinking atrous convolution for semantic image segmentation, ArXiv Preprint, ArXiv: 1706. 05587, 2017
    [27] Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L C. Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 4510−4520
    [28] Ronneberger O, Fischer P, and Brox T. U-net: Convolutional networks for biomedical image segmentation. In: Proceedings of the 2015 International Conference on Medical image computing and computer-assisted intervention. Munich, Germany: Springer, Cham, 2015. 23−241
    [29] Chu X, Yang W, Ouyang W, Ma C, Yuille A L., Wang X. Multi-context attention for human pose estimation. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE, 2017. 1831−1840
    [30] Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua T S. Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE, 2017. 5659−5667
    [31] Lu J, Yang J, Batra D, Parikh D. Hierarchical question-image co-attention for visual question answering, ArXiv Preprint, ArXiv: 1606. 00061, 2016
    [32] Wu Q, Wang P, Shen C, Reid I, Van-Den-Hengel A. Are you talking to me? reasoned visual dialog generation through adversarial learning. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 6106−6115
  • 加载中
计量
  • 文章访问数:  277
  • HTML全文浏览量:  92
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-07-06
  • 修回日期:  2021-10-18
  • 网络出版日期:  2021-11-20

目录

    /

    返回文章
    返回