2.624

2020影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于多尺度变形卷积的特征金字塔光流计算方法

范兵兵 葛利跃 张聪炫 李兵 冯诚 陈震

范兵兵, 葛利跃, 张聪炫, 李兵, 冯诚, 陈震. 基于多尺度变形卷积的特征金字塔光流计算方法. 自动化学报, 2022, 45(x): 1001−1013 doi: 10.16383/j.aas.c220142
引用本文: 范兵兵, 葛利跃, 张聪炫, 李兵, 冯诚, 陈震. 基于多尺度变形卷积的特征金字塔光流计算方法. 自动化学报, 2022, 45(x): 1001−1013 doi: 10.16383/j.aas.c220142
Fan Bing-Bing, Ge Li-Yue, Zhang Cong-Xuan, Li Bing, Feng Cheng, Chen Zhen. A feature pyramid optical flow estimation method based on multi-scale deformable convolution. Acta Automatica Sinica, 2022, 45(x): 1001−1013 doi: 10.16383/j.aas.c220142
Citation: Fan Bing-Bing, Ge Li-Yue, Zhang Cong-Xuan, Li Bing, Feng Cheng, Chen Zhen. A feature pyramid optical flow estimation method based on multi-scale deformable convolution. Acta Automatica Sinica, 2022, 45(x): 1001−1013 doi: 10.16383/j.aas.c220142

基于多尺度变形卷积的特征金字塔光流计算方法

doi: 10.16383/j.aas.c220142
基金项目: 科技创新2030“新一代人工智能” 重大项目(2020AAA0105802, 2020AAA0105801, 2020AAA0105800), 国家重点研发计划 (2020 YFC2003800), 国家自然科学基金 (62222206, 62272209, 61866026, 61772255, 61866025), 江西省技术创新引导类计划项目(20212AEI 91005), 江西省教育厅科学技术项目 (GJJ210910), 江西省优势科技创新团队 (20165BCB19007), 江西省自然科学基金重点项目 (20202ACB214007), 江西省图像处理与模式识别重点实验室开放基金 (ET202104413) 资助
详细信息
    作者简介:

    范兵兵:南昌航空大学测试与光电工程学院硕士研究生. 主要研究方向为计算机视觉. E-mail: 1908080400123@stu.nchu.edu.cn

    葛利跃:南昌航空大学信息工程学院助理实验师. 主要研究方向为图像检测与智能识别. E-mail: lygeah@163.com

    张聪炫:南昌航空大学测试与光电工程学院教授. 2014年获得南京航空航天大学博士学位. 主要研究方向为图像处理与计算机视觉. 本文通讯作者. E-mail: zcxdsg@163.com

    李兵:中国科学院自动化研究所模式识别国家重点实验室研究员. 2009年获得北京交通大学博士学位. 主要研究方向为视频内容理解与多媒体内容安全. E-mail: bli@nlpr.ia.ac.cn

    冯诚:北京航空航天大学仪器科学与光电工程学院博士研究生. 主要研究方向为图像处理与计算机视觉. E-mail: fengcheng00016@163.com

    陈震:南昌航空大学测试与光电工程学院教授. 2003年获得西北工业大学博士学位. 主要研究方向为图像处理与计算机视觉. E-mail: dr_chenzhen@163.com

A Feature Pyramid Optical Flow Estimation Method Based on Multi-scale Deformable Convolution

Funds: Supported by Science and Technology Innovation 2030 “New Generation Artificial Intelligence” Major Program (2020AAA0 105802, 2020AAA0105801, 2020AAA010580), National Key Research and Development Program of China (2020YFC2003800), National Natural Science Foundation of China (62222206, 622722 09, 61866026, 61772255, 61866025), the Technological Innovation Guidance Program of Jiangxi Province (20212AEI91005), Science and Technology Program of Education Department of Jiangxi Province (GJJ210910), Advantage Subject Team of Jiangxi Province (20165BCB19007), Natural Science Foundation of Jiangxi Province (20202ACB214007), and the Open Fund of Jiangxi Key Laboratory for Image Processing and Pattern Recognition (ET202104413)
More Information
    Author Bio:

    FAN Bing-Bing Master student at the School of Measuring and Optical Engineering, Nanchang Hangkong University. His main research interest is computer vision

    GE Li-Yue Assistant experimenter at the School of Information Engineering, Nanchang Hangkong University. His research interest covers image detection and intelligent recognition

    ZHANG Cong-Xuan Professor at the School of Measuring and Optical Engineering, Nanchang Hangkong University. He received his Ph.D. degree from Nanjing University of Aeronautics and Astronautics in 2014. His research interest covers image processing and computer vision. Corresponding author of this paper

    LI Bing Researcher at the National Key Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. He received his Ph.D. degree from Beijing Jiaotong University in 2009. His research interest covers video understanding and multimedia content security

    FENG Cheng Ph.D. candidate at the School of Instrumentation and Optoelectronic Engineering, Beihang University. His main research interest covers image processing and computer vision

    CHEN Zhen Professor at the School of Measuring and Optical Engineering, Nanchang Hangkong University. He received his Ph.D. degree from Northwestern Polytechnical University in 2003. His research interest covers image processing and computer vision

  • 摘要: 针对现有深度学习光流计算方法的运动边缘模糊问题, 提出了一种基于多尺度变形卷积的特征金字塔光流计算方法. 首先, 构造基于多尺度变形卷积的特征提取模型, 显著提高图像边缘区域特征提取的准确性; 然后, 将多尺度变形卷积特征提取模型与特征金字塔光流计算网络耦合, 提出一种基于多尺度变形卷积的特征金字塔光流计算模型; 最后, 设计一种结合图像与运动边缘约束的混合损失函数, 通过指导模型学习更加精准的边缘信息, 克服了光流计算运动边缘模糊问题. 分别采用 MPI-Sintel 和 KITTI2015 测试图像集对该方法与代表性的深度学习光流计算方法进行综合对比分析. 实验结果表明, 该方法具有更高的光流计算精度, 有效解决了光流计算的边缘模糊问题.
  • 图  1  标准卷积与变形卷积图像特征提取示意图与对应模型光流计算结果

    Fig.  1  Schematic diagram of standard convolution and deformed convolution image feature extraction and corresponding model optical flow estimation results

    图  2  多尺度变形卷积特征提取网络结构示意图

    Fig.  2  Schematic diagram of multi-scale deformed convolution feature extraction network structure

    图  3  本文方法特征提取与标准卷积特征提取结果可视化对比

    Fig.  3  Visual comparison of the feature extraction results of the method in this paper and the standard convolution feature extraction results

    图  4  基于多尺度变形卷积的特征金字塔光流计算网络模型

    Fig.  4  Feature pyramid optical flow computing network model based on multi-scale deformed convolution

    图  5  不同层数多尺度变形卷积模型光流计算结果对比

    Fig.  5  Comparison of optical flow calculation results of multi-scale deformed convolution models with different layers

    图  6  不同损失函数训练模型

    Fig.  6  Training models with different loss functions

    图  7  不同损失函数训练模型后的光流计算结果可视化对比

    Fig.  7  Visual comparison of optical flow calculation results after training models with different loss functions

    图  8  MPI-Sintel 数据集光流结果可视化对比图

    Fig.  8  Visualization comparison of optical flow results in MPI-Sintel dataset

    图  9  KITTI2015 数据集光流误差结果对比图

    Fig.  9  Comparison of optical flow error results in KITTI2015 dataset

    图  10  各消融模型光流计算结果可视化对比图, 第2、4行为标签区域放大图

    Fig.  10  Visual comparison of optical flow calculation results for each ablation model, the second and fourth rows are enlarged images of the label area

    表  1  MPI-Sintel数据集图像序列光流计算结果

    Table  1  Optical flow calculation results of image sequences in MPI-Sintel dataset

    对比方法 Clean Final
    All Matched Unmatched All Matched Unmatched
    FlowNet 2.0[15] 4.16 1.56 25.40 5.74 2.75 30.11
    PWC-Net[16] 4.39 1.72 26.17 5.04 2.45 26.22
    IRR-PWC_RVC[19] 3.79 2.04 18.04 4.80 2.77 21.34
    FDFlowNet[31] 3.71 1.54 21.38 5.11 2.52 26.23
    FastFlowNet[25] 4.89 1.79 30.18 6.08 2.94 31.69
    Semantic_Lattice[28] 3.84 1.70 21.30 4.89 2.46 24.70
    OAS-Net[29] 3.65 1.49 21.32 5.04 2.46 25.86
    本文方法 3.43 1.31 20.79 4.78 2.32 24.77
    下载: 导出CSV

    表  2  MPI-Sintel 数据集运动边缘与大位移指标对比结果

    Table  2  Comparison results of motion edge and large displacement index in MPI-Sintel dataset

    对比方法 Clean Final
    $ {\rm{d}}_{0-10} $ $ {\rm{d}}_{10-60} $ $ {\rm{d}}_{60-140} $ $ {\rm{s}}_{0-10} $ $ {\rm{s}}_{10-40} $ $ {\rm{s}}_{40+} $ $ {\rm{d}}_{0-10} $ $ {\rm{d}}_{10-60} $ $ {\rm{d}}_{60-140} $ $ {\rm{s}}_{0-10} $ $ {\rm{s}}_{10-40} $ $ {\rm{s}}_{40+} $
    FlowNet 2.0[15] 3.09 1.32 0.92 0.64 1.90 25.42 5.14 2.79 2.10 1.24 4.03 34.51
    PWC-Net[16] 4.28 1.66 0.67 0.61 2.07 28.79 4.64 2.09 1.48 0.80 2.99 31.07
    IRR-PWC_RVC[19] 4.05 1.70 1.04 0.68 2.11 23.23 5.06 2.55 1.66 0.81 3.20 28.45
    FDFlowNet[31] 3.81 1.42 0.69 0.84 2.20 21.63 4.67 2.17 1.64 1.03 3.12 30.16
    FastFlowNet[25] 4.25 1.64 0.91 0.81 2.36 31.24 5.20 2.56 2.04 1.07 3.41 37.44
    Semantic_Lattice[28] 3.86 1.43 0.80 0.60 2.00 24.40 4.60 2.08 1.53 0.80 3.02 29.65
    OAS-Net[29] 3.81 1.39 0.59 0.75 2.13 21.78 4.54 2.05 1.57 0.88 2.91 30.63
    本文方法 3.15 1.15 0.59 0.64 1.78 21.33 4.13 1.87 1.59 0.85 2.60 29.51
    下载: 导出CSV

    表  3  KITTI2015数据集计算结果

    Table  3  Calculation results in KITTI2015 dataset

    对比方法 Fl-bg Fl-fg Fl-all time (s)
    FlowNet 2.0[15] 10.75% 8.75% 10.41% 0.12
    PWC-Net[16] 9.66% 9.31% 9.60% 0.07
    IRR-PWC_RVC[19] 7.61% 12.22% 8.38% 0.18
    LiteFlowNet[26] 9.66% 7.99% 9.38% 0.09
    FlowNet3[27] 9.82% 10.91% 10.00% 0.09
    LSM_RVC[30] 7.33% 13.06% 8.28% 0.25
    FDFlowNet[31] 9.31% 9.71% 9.38% 0.05
    本文方法 7.25% 10.06% 7.72% 0.13
    下载: 导出CSV

    表  4  MPI-Sintel数据集上消融实验结果对比

    Table  4  Comparison of ablation experiment results in MPI-Sintel dataset

    消融模型 All Matched Unmatched ${\rm{d}}_{0-10}$ ${\rm{d}}_{10-60}$ ${\rm{d}}_{60-140}$
    baseline 4.39 1.72 26.17 4.28 1.66 0.67
    baseline_loss 4.03 1.63 23.76 3.17 1.25 0.97
    baseline_md 4.19 1.69 24.58 3.32 1.35 0.98
    full model 3.43 1.31 20.79 3.15 1.15 0.59
    下载: 导出CSV
  • [1] 付婧祎, 余磊, 杨文, 卢昕. 基于事件相机的连续光流估计. 自动化学报, 在线出版, DOI: 10.16383/j.aas.c210242

    Fu Jing-Yi, Yu Lei, Yang Wen, Lu Xin. Event-based continuous optical flow estimation. Acta Automatica Sinica, to be published, DOI: 10.16383/j.aas.c210242
    [2] Mahapatra D, Ge Z Y. Training data independent image registration using generative adversarial networks and domain adaptation.Pattern Recognition, 2020, 100: Article No. 107109
    [3] 张学森, 贾静平. 基于三维卷积神经网络和峰值帧光流的微表情识别算法. 模式识别与人工智能, 2021, 34(5): 423-433 doi: 10.16451/j.cnki.issn1003-6059.202105005

    Zhang X S, Jia J P, Cheng Y H, Wang X S. Micro-expression recognition algorithm based on 3D convolutional neural network and optical flow fields from neighboring frames of apex frame. Pattern Recognition and Artificial Intelligence, 2021, 34(5): 423-433 doi: 10.16451/j.cnki.issn1003-6059.202105005
    [4] 冯诚, 张聪炫, 陈震, 李兵, 黎明. 基于光流与多尺度上下文的图像序列运动遮挡检测. 自动化学报, 在线出版, 2021, DOI: 10.16383/j.aas.c210324

    Feng Cheng, Zhang Cong-Xuan, Chen Zhen, Li Bing, Li Ming. Occlusion detection based on optical flow and multiscale context aggregation. Acta Automatica Sinica, to be published, 2021, DOI: 10.16383/j.aas.c210324
    [5] Bahraini M S, Zenati A, Aouf N. Autonomous cooperative visual navigation for planetary exploration robots. In: Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA). Xi'an, China: IEEE, 2021. 9653−9658
    [6] Zhai M L, Xiang X Z, Lv N, Kong X D. Optical flow and scene flow estimation: A survey. Pattern Recognition, 2021, 114: Article No. 107861
    [7] Rao S N, Wang H Z. Robust optical flow estimation via edge preserving filterin. Signal Processing: Image Communication, 2021, 96: Article No. 116309
    [8] Zhang C X, Chen Z, Wang M R, Li M, Jiang S F. Robust non-local TV-L1 optical flow estimation with occlusion detection. IEEE Transactions on Image Processing, 2017, 26: 4055-4067 doi: 10.1109/TIP.2017.2712279
    [9] Mei L, Lai J H, Xie X H, Zhu J Y, Chen J. Illumination-invariance optical flow estimation using weighted regularization transform. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(2): 495-508 doi: 10.1109/TCSVT.2019.2890861
    [10] Chen J, Cai Z, Lai J H, Xie X H. Efficient segmentation-based PatchMatch for large displacement optical flow estimation. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29(12): 3595-3607 doi: 10.1109/TCSVT.2018.2885246
    [11] Deng Y, Xiao J M, Zhou S Z, Feng J S. Detail preserving coarse-to-fine matching for stereo matching and optical flow. IEEE Transactions on Image Processing, 2021, 30: 5835-5847 doi: 10.1109/TIP.2021.3088635
    [12] Zhang C X, Ge L Y, Chen Z, Li M, Liu W, Chen H. Refined TV-L1 optical flow estimation using joint filtering. IEEE Transactions Multimedia, 2020, 22(2): 349-364 doi: 10.1109/TMM.2019.2929934
    [13] Dong C, Wang Z S, Han J M, Xing C D, Tang S F. A non-local propagation filtering scheme for edge-preserving in variational optical flow computation. Signal Processing: Image Communication, 2021, 93: Article No. 116143
    [14] Dosovitskiy A, Fischer P, Ilg E, Hausser P, Hazirbas C, Golkov V. FlowNet: Learning optical flow with convolutional networks. In: Proceedings of the 2015 International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015, 2758−2766
    [15] Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In: Proceedings of the 2017 IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017, 1647−1655
    [16] Sun D Q, Yang X D, Liu M Y, Jan K. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the 2018 IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, USA: IEEE, 2018, 8934−8943
    [17] Yu J J, Harley A W, Derpanis K G. Back to basics: Unsupervised learning of optical flow via brightness constancy and motion smoothness. In: Proceedings of the 2016 European Conference on Computer Vision (ECCV). Amsterdam, The Netherlands: Springer, 2016, 3−10
    [18] Liu P P, King I, Lyu M R, Xu J. DDFlow: Learning optical flow with unlabeled data distillation. In: Proceedings of the 2019 AAAI Conference on Artificial Intelligence. Phoenix, USA: AAAI, 2019, 2−8
    [19] Hur J, Roth S. Iterative residual refinement for joint optical flow and occlusion estimation. In: Proceedings of the 2019 IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019, 5747−5756
    [20] Zhao S Y, Sheng Y L, Dong Y, Chang E I, Xu Y. MaskFlownet: Asymmetric feature matching with learnable occlusion mask. In: Proceedings of the 2020 IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020, 6277−6286
    [21] Meister S, Hur J, Roth S. UnFlow: Unsupervised learning of optical flow with a bidirectional census loss. In: Proceedings of the 2018 AAAI Conference on Artificial Intelligence. New Orleans, USA: AAAI, 2018, 7251−7259
    [22] Zhang C X, Zhou Z Z, Chen Z, Hu W M, Li M, Jiang S F. Self-attention-based multiscale feature learning optical flow with occlusion feature map prediction. IEEE Transactions on Multimedia, 2021: 3340-3354 doi: 10.1109/TMM.2021.3096083, to be published
    [23] Butler D J, Wulff J, Stanley G B, Black M J. A naturalistic open source movie for optical flow evaluation. In: Proceedings of the 2012 European Conference on Computer Vision (ECCV). Florence, Italy: Springer, 2012, 611−625
    [24] Menze M, Geiger A. Object scene flow for autonomous vehicles. In: Proceedings of the 2015 IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE, 2015, 3061−3070
    [25] Kong L T, Shen C H, Yang J. FastFlowNet: A lightweight network for fast optical flow estimation. In: Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA). Xi'an, China: IEEE, 2021, 10310−10316
    [26] Hui T W, Tang X O, Loy C C. LiteFlowNet: A lightweight convolutional neural network for optical flow estimation. In: Proceedings of the 2018 IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, USA: IEEE, 2018, 8981−8989
    [27] Ilg E, Saikia T, Keuper M, Brox T. Occlusions, motion and depth boundaries with a generic network for disparity, optical flow or scene flow estimation. In: Proceedings of the 2018 European Conference on Computer Vision (ECCV). Munich, Germany: Springer, 2018, 626−643
    [28] Wannenwetsch A S, Kiefel M, Gehler P V, Roth S. Learning task-specific generalized convolutions in the permutohedral lattice. In: Proceedings of the 2019 German Conference on Pattern Recognition (GCPR). Cham, Germany: Springer, 2019, 345−359
    [29] Kong L T, Yang X H, Yang J. OAS-Net: Occlusion aware sampling network for accurate optical flow. In: Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toronto, Canada: IEEE, 2021, 2475−2479
    [30] Tang C Z, Yuan L, Tan P. LSM: Learning subspace minimization for low-level vision. In: Proceedings of the 2020 IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020, 6234−6245
    [31] Kong L T, Yang J. FDFlowNet: Fast optical flow estimation using a deep lightweight network. In: Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP). Anchorage, USA: IEEE, 2020, 1501−1505
  • 加载中
计量
  • 文章访问数:  98
  • HTML全文浏览量:  24
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-05-22
  • 录用日期:  2022-07-22
  • 网络出版日期:  2022-09-22

目录

    /

    返回文章
    返回