2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

联合深度超参数卷积和交叉关联注意力的大位移光流估计

王梓歌 葛利跃 陈震 张聪炫 王子旭 舒铭奕

王梓歌, 葛利跃, 陈震, 张聪炫, 王子旭, 舒铭奕. 联合深度超参数卷积和交叉关联注意力的大位移光流估计. 自动化学报, 2024, 50(6): 1−15 doi: 10.16383/j.aas.c230049
引用本文: 王梓歌, 葛利跃, 陈震, 张聪炫, 王子旭, 舒铭奕. 联合深度超参数卷积和交叉关联注意力的大位移光流估计. 自动化学报, 2024, 50(6): 1−15 doi: 10.16383/j.aas.c230049
Wang Zi-Ge, Ge Li-Yue, Chen Zhen, Zhang Cong-Xuan, Wang Zi-Xu, Shu Ming-Yi. Large displacement optical flow estimation jointing depthwise over-parameterized convolution and cross correlation attention. Acta Automatica Sinica, 2024, 50(6): 1−15 doi: 10.16383/j.aas.c230049
Citation: Wang Zi-Ge, Ge Li-Yue, Chen Zhen, Zhang Cong-Xuan, Wang Zi-Xu, Shu Ming-Yi. Large displacement optical flow estimation jointing depthwise over-parameterized convolution and cross correlation attention. Acta Automatica Sinica, 2024, 50(6): 1−15 doi: 10.16383/j.aas.c230049

联合深度超参数卷积和交叉关联注意力的大位移光流估计

doi: 10.16383/j.aas.c230049
基金项目: 国家自然科学基金(62222206, 62272209), 江西省重大科技研发专项(20232ACC01007), 江西省重点研发计划重点专项(20232BBE50006), 江西省技术创新引导类计划项目(2021AEI91005), 江西省教育厅科学技术项目(GJJ210910), 江西省图像处理与模式识别重点实验室开放基金(ET202104413)资助
详细信息
    作者简介:

    王梓歌:南昌航空大学测试与光电工程学院硕士研究生. 主要研究方向为计算机视觉. E-mail: Wangzggg@163.com

    葛利跃:南昌航空大学信息工程学院助理实验师. 北京航空航天大学仪器科学与光电工程学院博士研究生. 主要研究方向图像检测与智能识别. E-mail: lygeah@163.com

    陈震:南昌航空大学测试与光电工程学院教授. 2003年获得西北工业大学博士学位. 主要研究方向为图像处理与计算机视觉. E-mail: dr_chenzhen@163.com

    张聪炫:南昌航空大学测试与光电工程学院教授. 2014年获得南京航空航天大学博士学位. 主要研究方向为图像处理与计算机视觉. 本文通信作者. E-mail: zcxdsg@163.com

    王子旭:南昌航空大学测试与光电工程学院硕士研究生. 主要研究方向为计算机视觉. E-mail: wangzixu0827@163.com

    舒铭奕:南昌航空大学测试与光电工程学院硕士研究生. 主要研究方向为计算机视觉. E-mail: shumingyi1997@163.com

Large Displacement Optical Flow Estimation Jointing Depthwise Over-parameterized Convolution and Cross Correlation Attention

Funds: Supported by National Natural Science Foundation of China (62222206, 62272209), National Science and Technology Major Project of Jiangxi Province (20232ACC01007), Key Research and Development Program of Jiangxi Province (20232BBE50006), the Technological Innovation Guidance Program of Jiangxi Province (2021AEI91005), Science and Technology Program of Education Department of Jiangxi Province (GJJ210910), and the Open Fund of Jiangxi Key Laboratory for Image Processing and Pattern Recognition (ET202104413)
More Information
    Author Bio:

    WANG Zi-Ge Master student at the School of Measuring and Optical Engineering, Nanchang Hangkong University. Her main research interest is computer vision

    GE Li-Yue Assistant experimenter at the School of Information Engineering, Nanchang Hangkong University. Ph.D. candidate at the School of Instrumentation and Optoelectronic Engineering, Beihang University. His research interest covers image detection and intelligent recognition

    CHEN Zhen Professor at the School of Measuring and Optical Engineering, Nanchang Hangkong University. He received his Ph.D. degree from Northwestern Polytechnical University in 2003. His research interest covers image processing and computer vision

    ZHANG Cong-Xuan Professor at the School of Measuring and Optical Engineering, Nanchang Hangkong University. He received his Ph.D. degree from Nanjing University of Aeronautics and Astronautics in 2014. His research interest covers image processing and computer vision. Corresponding author of this paper

    WANG Zi-Xu Master student at the School of Measuring and Optical Engineering, Nanchang Hangkong University. His main research interest is computer vision

    SHU Ming-Yi Master student at the School of Measuring and Optical Engineering, Nanchang Hangkong University. His main research interest is computer vision

  • 摘要: 针对现有深度学习光流估计模型在大位移场景下的准确性和鲁棒性问题, 本文提出了一种联合深度超参数卷积和交叉关联注意力的图像序列光流估计方法. 首先, 通过联合深层卷积和标准卷积构建深度超参数卷积以替代普通卷积, 提取更多特征并加快光流估计网络训练的收敛速度, 在不增加网络推理量的前提下提高光流估计的准确性; 然后, 设计基于交叉关联注意力的特征提取编码网络, 通过叠加注意力层数获得更大的感受野, 以提取多尺度长距离上下文特征信息, 增强大位移场景下光流估计的鲁棒性; 最后, 采用金字塔残差迭代模型构建联合深度超参数卷积和交叉关联注意力的光流估计网络, 提升光流估计的整体性能. 分别采用MPI-Sintel和KITTI测试图像集对本文方法和现有代表性光流估计方法进行综合对比分析, 实验结果表明本文方法取得了较好的光流估计性能, 尤其在大位移场景下具有更好的估计准确性与鲁棒性.
  • 图  1  基于深度超参数卷积和交叉关联注意力的大位移光流估计网络示意图

    Fig.  1  Structure diagram of large displacement optical flow estimation based on depthwise over-parameterized convolution and cross correlation attention

    图  2  深度超参数卷积和标准卷积示意图

    Fig.  2  The structure diagram of conventional convolution and depthwise over-parameterized convolution

    图  3  深度超参数卷积操作

    Fig.  3  The operation of depthwise over-parameterized convolution

    图  4  不同光流模型特征图对比

    Fig.  4  Comparison of feature maps of different optical flow models

    图  5  交叉关联注意力机制

    Fig.  5  The cross correlation attention block

    图  6  基于交叉关联注意力的光流特征编码网络示意图

    Fig.  6  Structure diagram of optical flow feature encoder network based on cross correlation attention

    图  7  不同光流模型估计结果对比

    Fig.  7  Comparison of results of different optical flow models

    图  8  Clean和Final数据集不同序列特征图可视化 (其中红框区域内为存在明显区别的边缘特征信息结果)

    Fig.  8  Visualization of feature maps of different sequence in Clean and Final datasets (The red bounding box contains edge feature information results with significant differences)

    图  9  金字塔不同层数下不同尺度目标特征可视化

    Fig.  9  Visualization of Feature maps at different scales under different layers of pyramid

    图  10  MPI-Sintel测试集图像序列对比方法光流估计可视化结果

    Fig.  10  Flow field results of the comparable methods evaluated on some MPI-Sintel test datasets

    图  11  KITTI2015测试集图像序列对比方法光流估计误差可视化结果

    Fig.  11  Flow error maps of the comparable methods tested on KITTI2015 datasets

    图  12  Baseline_deconv在各数据集训练过程

    Fig.  12  The training process of Baseline_deconv on each dataset

    图  13  消融模型光流估计结果在MPI-Sintel测试数据集可视化对比

    Fig.  13  Comparison of visualization results of each ablation model on MPI-Sintel test datasets

    图  14  消融模型光流估计结果在KITTI2015测试数据集可视化对比

    Fig.  14  Comparison of visualization results of each ablation model on KITTI2015 datasets

    表  1  MPI-Sintel数据集图像序列光流估计结果

    Table  1  Optical flow calculation results of image sequences in MPI-Sintel dataset

    CleanFinal
    对比方法AllMatchedUnmatchedAllMatchedUnmatched
    IRR-PWC[14]3.8441.47223.2204.5792.15424.355
    PPAC-HD3[36]4.5891.50729.7514.5992.11624.852
    LiteFlowNet2[37]3.4831.38320.6374.6862.24824.571
    IOFPL-ft[38]4.3941.61127.1284.2241.95622.704
    PWC-Net[25]4.3861.71926.1665.0422.44526.221
    HMFlow[39]3.2061.12220.2105.0382.40426.535
    SegFlow153[40]4.1511.24627.8556.1912.94032.682
    SAMFL[41]4.4771.76326.6434.7652.28225.008
    本文方法2.7631.06216.6564.2022.05621.696
    下载: 导出CSV

    表  2  数据集运动边缘与大位移指标对比结果

    Table  2  Comparison results of motion edge and large displacement index in MPI-Sintel dataset

    CleanFinal
    对比方法${d}_{0\text{-}10}$${d}_{10\text{-}60}$${d}_{60\text{-}140}$${s}_{0\text{-}10}$${s}_{10\text{-}40}$${s}_{40+}$${d}_{0\text{-}10}$${d}_{10\text{-}60}$${d}_{60\text{-}140}$${s}_{0\text{-}10}$${s}_{10\text{-}40}$${s}_{40+}$
    IRR-PWC[14]3.5091.2960.7210.5351.72425.4304.1651.8431.2920.7092.42328.998
    PPAC-HD3[36]2.7881.3401.0680.3551.28933.6243.5211.7021.6370.6172.08330.457
    LiteFlowNet2[37]3.2931.2630.6290.5971.77221.9764.0481.8991.4730.8112.43329.375
    IOFPL-ft[38]3.0591.4210.9430.3911.29231.8123.2881.4791.4190.6461.89727.596
    PWC-Net[25]4.2821.6570.6740.6062.07028.7934.6362.0871.4750.7992.98631.070
    HMFlow[39]2.7860.9570.5840.4671.69320.4704.5822.2131.4650.9263.17029.974
    SegFlow153[40]3.0721.1430.6560.4862.00027.5634.9692.4922.1191.2013.86536.570
    SAMFL[41]3.9461.6230.8110.6181.86029.9954.2081.8461.4490.8932.58729.232
    本文方法2.7720.8540.4430.5411.62116.5753.8841.6601.2920.7532.38125.715
    下载: 导出CSV

    表  3  KITTI2015数据集计算结果 (%)

    Table  3  Calculation results in KITTI2015 dataset (%)

    对比方法$Fl\text{-}bg $$Fl\text{-}fg $$Fl\text{-}all $
    IRR-PWC[14]7.687.527.65
    PPAC-HD3[36]5.787.486.06
    LiteFlowNet2[37]7.627.647.62
    IOFPL-ft[38]6.52
    PWC-Net[25]9.66 9.319.60
    SegFlow153[40]22.2123.7222.46
    SAMFL[41]7.727.437.68
    本文方法7.436.657.30
    下载: 导出CSV

    表  4  MPI-Sintel数据集上消融实验结果对比

    Table  4  Comparison of ablation experiment results in MPI-Sintel dataset

    消融模型AllMatchedUnmatched$s_{10\text{-}40}$$s_{40+}$
    Baseline3.8441.47223.2201.72425.430
    Baseline_CS2.8921.07017.7651.66217.460
    Baseline_deconv3.6211.46121.2721.65923.482
    Full model2.7631.06216.6561.62116.575
    下载: 导出CSV
  • [1] 张骄阳, 丛爽, 匡森. n比特随机量子系统实时状态估计及其反馈控制. 自动化学报, DOI: 10.16383/j.aas.c210916

    Zhang Jiao-Yang, Cong Shuang, Kuang Sen. Real-time state estimation and feedback control for n-qubit stochastic quantum systems. Acta Automatica Sinica, DOI: 10.16383/j.aas.c210916
    [2] 张伟, 黄卫民. 基于种群分区的多策略自适应多目标粒子群算法. 自动化学报, 2022, 48(10): 2585−2599 doi: 10.16383/j.aas.c200307

    Zhang Wei, Huang Wei-Min. Multi-strategy adaptive multi-objective particle swarm optimization algorithm based on swarm partition. Acta Automatica Sinica, 2022, 48(10): 2585−2599 doi: 10.16383/j.aas.c200307
    [3] 张芳, 赵东旭, 肖志涛, 耿磊, 吴骏, 刘彦北. 单幅图像超分辨率重建技术研究进展. 自动化学报, 2022, 48(11): 2634−2654 doi: 10.16383/j.aas.c200777

    Zhang Fang, Zhao Dong-Xu, Xiao Zhi-Tao, Geng Lei, Wu Jun, Liu Yan-Bei. Research progress of single image super-resolution reconstruction technology. Acta Automatica Sinica, 2022, 48(11): 2634−2654 doi: 10.16383/j.aas.c200777
    [4] 杨天金, 侯振杰, 李兴, 梁久祯, 宦娟, 郑纪翔. 多聚点子空间下的时空信息融合及其在行为识别中的应用. 自动化学报, 2022, 48(11): 2823−2835 doi: 10.16383/j.aas.c190327

    Yang Tian-Jin, Hou Zhen-Jie, Li Xing, Liang Jiu-Zhen, Huan Juan, Zheng Ji-Xiang. Recognizing action using multi-center subspace learning-based spatial-temporal information fusion. Acta Automatica Sinica, 2022, 48(11): 2823−2835 doi: 10.16383/j.aas.c190327
    [5] 闫梦凯, 钱建军, 杨健. 弱对齐的跨光谱人脸检测. 自动化学报, 2023, 49(1): 135−147 doi: 10.16383/j.aas.c210058

    Yan Meng-Kai, Qian Jian-Jun, Yang Jian. Weakly aligned cross-spectral face detection. Acta Automatica Sinica, 2023, 49(1): 135−147 doi: 10.16383/j.aas.c210058
    [6] 郭迎春, 冯放, 阎刚, 郝小可. 基于自适应融合网络的跨域行人重识别方法. 自动化学报, 2022, 48(11): 2744−2756 doi: 10.16383/j.aas.c220083

    Guo Ying-Chun, Feng Fang, Yan Gang, Hao Xiao-Ke. Cross-domain person re-identification on adaptive fusion network. Acta Automatica Sinica, 2022, 48(11): 2744−2756 doi: 10.16383/j.aas.c220083
    [7] Horn B K P, Schunck B G. Determining optical flow. Artificial Intelligence, 1981, 17(1-3): 185−203 doi: 10.1016/0004-3702(81)90024-2
    [8] Sun D Q, Roth S, Black M J. Secrets of optical flow estimation and their principles. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). San Francisco, USA: IEEE, 2010. 2432−2439
    [9] Menze M, Heipke C, Geiger A. Discrete optimization for optical flow. In: Proceedings of the 37th German Conference Pattern Recognition (GCPR). Aachen, Germany: Springer, 2015. 16−28
    [10] Chen Q F, Koltun V. Full flow: Optical flow estimation by global optimization over regular grids. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 4706−4714
    [11] Dosovitskiy A, Fischer P, Ilg E, Häusser P, Hazirbas C, Golkov V. FlowNet: Learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 2758−2766
    [12] Ranjan A, Black M J. Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 2720−2729
    [13] Amiaz T, Lubetzky E, Kiryati N. Coarse to over-fine optical flow estimation. Pattern Recognition, 2007, 40(9): 2496−2503 doi: 10.1016/j.patcog.2006.09.011
    [14] Hur J, Roth S. Iterative residual refinement for joint optical flow and occlusion estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 5754−5763
    [15] Tu Z G, Xie W, Zhang D J, Poppe R, Veltkamp R C, Li B X, et al. A survey of variational and CNN-based optical flow techniques. Signal Processing: Image Communication, 2019, 72: 9−24 doi: 10.1016/j.image.2018.12.002
    [16] Zhang C X, Ge L Y, Chen Z, Li M, Liu W, Chen H. Refined TV-L.1 optical flow estimation using joint filtering. IEEE Transactions on Multimedia, 2020, 22(2): 349−364 doi: 10.1109/TMM.2019.2929934
    [17] Dalca A V, Rakic M, Guttag J, Sabuncu M R. Learning conditional deformable templates with convolutional networks. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc., 2019. Article No. 32
    [18] Chen J, Lai J H, Cai Z M, Xie X H, Pan Z G. Optical flow estimation based on the frequency-domain regularization. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(1): 217−230 doi: 10.1109/TCSVT.2020.2974490
    [19] Zhai M L, Xiang X Z, Lv N, Kong X D. Optical flow and scene flow estimation: A survey. Pattern Recognition, 2021, 114: Article No. 107861 doi: 10.1016/j.patcog.2021.107861
    [20] Zach C, Pock T, Bischof H. A duality based approach for realtime TV-L.1 optical flow. In: Proceedings of the 29th DAGM Symposium on Pattern Recognition. Heidelberg, Germany: Springer, 2007. 214−223
    [21] Zhao S Y, Zhao L, Zhang Z X, Zhou E Y, Metaxas D. Global matching with overlapping attention for optical flow estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022. 17571−17580
    [22] Li Z W, Liu F, Yang W J, Peng S H, Zhou J. A survey of convolutional neural networks: Analysis, applications, and prospects. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(12): 6999−7019 doi: 10.1109/TNNLS.2021.3084827
    [23] Han J W, Yao X W, Cheng G, Feng X X, Xu D. P-CNN: Part-based convolutional neural networks for fine-grained visual categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(2): 579−590 doi: 10.1109/TPAMI.2019.2933510
    [24] Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 1647−1655
    [25] Sun D Q, Yang X D, Liu M Y, Kautz J. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, USA: IEEE, 2018. 8934−8943
    [26] Wang Z G, Chen Z, Zhang C X, Zhou Z K, Chen H. LCIF-Net: Local criss-cross attention based optical flow method using multi-scale image features and feature pyramid. Signal Processing: Image Communication, 2023, 112: Article No. 116921 doi: 10.1016/j.image.2023.116921
    [27] Teed Z, Deng J. RAFT: Recurrent all-pairs field transforms for optical flow. In: Proceedings of the 16th European Conference on Computer Vision (ECCV). Glasgow, UK: Springer, 2020. 402−419
    [28] Han K, Xiao A, Wu E H, Guo J Y, Xu C J, Wang Y H. Transformer in transformer. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021. 15908−15919
    [29] Jiang S H, Campbell D, Lu Y, Li H D, Hartley R. Learning to estimate hidden motions with global motion aggregation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, USA: Canada, 2021. 9752−9761
    [30] Xu H F, Zhang J, Cai J F, Rezatofighi H, Tao D C. GMFlow: Learning optical flow via global matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022. 8111−8120
    [31] Cao J M, Li Y Y, Sun M C, Chen Y, Lischinski D, Cohen-Or D, et al. DO-Conv: Depthwise over-parameterized convolutional layer. IEEE Transactions on Image Processing, 2022, 31: 3726−3736 doi: 10.1109/TIP.2022.3175432
    [32] Dong X Y, Bao J M, Chen D D, Zhang W M, Yu N H, Yuan L, et al. CSWin transformer: A general vision transformer backbone with cross-shaped windows. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022. 12114−12124
    [33] Huang Z L, Wang X G, Huang L C, Huang C, Wei Y C, Liu W Y. CCNet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE, 2019. 603−612
    [34] Butler D J, Wulff J, Stanley G B, Black M J. A naturalistic open source movie for optical flow evaluation. In: Proceedings of the 12th European Conference on Computer Vision (ECCV). Florence, Italy: Springer, 2012. 611−625
    [35] Menze M, Geiger A. Object scene flow for autonomous vehicles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE, 2015. 3061−3070
    [36] Wannenwetsch A S, Roth S. Probabilistic pixel-adaptive refinement networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020. 11639−11648
    [37] Hui T W, Tang X O, Loy C C. A lightweight optical flow CNN—revisiting data fidelity and regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(8): 2555−2569 doi: 10.1109/TPAMI.2020.2976928
    [38] Hofinger M, Bulò S R, Porzi L, Knapitsch A, Pock T, Kontschieder P. Improving optical flow on a pyramid level. In: Proceedings of the 16th European Conference on Computer Vision (ECCV). Glasgow, UK: Springer, 2020. 770−786
    [39] Yu S H J, Zhang Y M, Wang C, Bai X, Zhang L, Hancock E R. HMFlow: Hybrid matching optical flow network for small and fast-moving objects. In: Proceedings of the 25th International Conference on Pattern Recognition (ICPR). Milan, Italy: IEEE, 2021. 1197−1204
    [40] Chen J, Cai Z M, Lai J H, Xie X H. Efficient segmentation-based PatchMatch for large displacement optical flow estimation. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29(12): 3595−3607 doi: 10.1109/TCSVT.2018.2885246
    [41] Zhang C X, Zhou Z K, Chen Z, Hu W M, Li M, Jiang S F. Self-attention-based multiscale feature learning optical flow with occlusion feature map prediction. IEEE Transactions on Multimedia, 2022, 24: 3340−3354 doi: 10.1109/TMM.2021.3096083
    [42] Lu Z, Xie H, Liu C, et al. Bridging the gap between vision transformers and convolutional neural networks on small datasets. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans, USA, 2022: 14663−14677
  • 加载中
计量
  • 文章访问数:  163
  • HTML全文浏览量:  62
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-02-10
  • 录用日期:  2023-08-29
  • 网络出版日期:  2023-10-07

目录

    /

    返回文章
    返回