联合深度超参数卷积和交叉关联注意力的大位移光流估计

王梓歌; 葛利跃; 陈震; 张聪炫; 王子旭; 舒铭奕

doi:10.16383/j.aas.c230049

联合深度超参数卷积和交叉关联注意力的大位移光流估计

doi: 10.16383/j.aas.c230049

王梓歌^{1, 2,},
葛利跃^{1, 3,},
陈震^{1, 2, 4,},
张聪炫^{1, 2, 4,},
王子旭^{1, 2,},
舒铭奕^{1, 2,}

1.
南昌航空大学江西省图像处理与模式识别重点实验室南昌 330063
2.
南昌航空大学测试与光电工程学院南昌 330063
3.
北京航空航天大学仪器科学与光电工程学院北京 100083
4.
南昌航空大学无损检测技术教育部重点实验室南昌 330063

基金项目: 国家自然科学基金(62222206, 62272209), 江西省重大科技研发专项(20232ACC01007), 江西省重点研发计划重点专项(20232BBE50006), 江西省技术创新引导类计划项目(2021AEI91005), 江西省教育厅科学技术项目(GJJ210910), 江西省图像处理与模式识别重点实验室开放基金(ET202104413)资助

详细信息

作者简介:
王梓歌：南昌航空大学测试与光电工程学院硕士研究生. 主要研究方向为计算机视觉. E-mail: Wangzggg@163.com

葛利跃：南昌航空大学助理实验师. 北京航空航天大学仪器科学与光电工程学院博士研究生. 主要研究方向为图像检测与智能识别. E-mail: lygeah@163.com

陈震：南昌航空大学测试与光电工程学院教授. 2003年获得西北工业大学博士学位. 主要研究方向为图像处理与计算机视觉. E-mail: dr_chenzhen@163.com

张聪炫：南昌航空大学测试与光电工程学院教授. 2014年获得南京航空航天大学博士学位. 主要研究方向为图像处理与计算机视觉. 本文通信作者. E-mail: zcxdsg@163.com

王子旭：南昌航空大学测试与光电工程学院硕士研究生. 主要研究方向为计算机视觉. E-mail: wangzixu0827@163.com

舒铭奕：南昌航空大学测试与光电工程学院硕士研究生. 主要研究方向为计算机视觉. E-mail: shumingyi1997@163.com

计量
- 文章访问数: 790
- HTML全文浏览量: 369
- PDF下载量: 115
- 被引次数: 0
出版历程
- 收稿日期: 2023-02-10
- 录用日期: 2023-08-29
- 网络出版日期: 2023-10-07
- 刊出日期: 2024-08-22

Large Displacement Optical Flow Estimation Jointing Depthwise Over-parameterized Convolution and Cross Correlation Attention

WANG Zi-Ge^{1, 2
,},
GE Li-Yue^{1, 3
,},
CHEN Zhen^{1, 2, 4
,},
ZHANG Cong-Xuan^{1, 2, 4
,},
WANG Zi-Xu^{1, 2
,},
SHU Ming-Yi^{1, 2
,}

1.
Key Laboratory of Jiangxi Province for Image Processing and Pattern Recognition, Nanchang Hangkong University, Nanchang 330063
2.
School of Measuring and Optical Engineering, Nanchang Hangkong University, Nanchang 330063
3.
School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing 100083
4.
Key Laboratory of Nondestructive Testing, Ministry of Education, Nanchang Hangkong University, Nanchang 330063

Funds: Supported by National Natural Science Foundation of China (62222206, 62272209), National Science and Technology Major Project of Jiangxi Province (20232ACC01007), Key Research and Development Program of Jiangxi Province (20232BBE50006), the Technological Innovation Guidance Program of Jiangxi Province (2021AEI91005), Science and Technology Program of Education Department of Jiangxi Province (GJJ210910), and the Open Fund of Jiangxi Key Laboratory for Image Processing and Pattern Recognition (ET202104413)

More Information

Author Bio:
WANG Zi-Ge　Master student at the School of Measuring and Optical Engineering, Nanchang Hangkong University. Her main research interest is computer vision

GE Li-Yue　Assistant experimenter at Nanchang Hangkong University. Ph.D. candidate at the School of Instrumentation and Optoelectronic Engineering, Beihang University. His research interest covers image detection and intelligent recognition

CHEN Zhen　Professor at the School of Measuring and Optical Engineering, Nanchang Hangkong University. He received his Ph.D. degree from Northwestern Polytechnical University in 2003. His research interest covers image processing and computer vision

ZHANG Cong-Xuan　Professor at the School of Measuring and Optical Engineering, Nanchang Hangkong University. He received his Ph.D. degree from Nanjing University of Aeronautics and Astronautics in 2014. His research interest covers image processing and computer vision. Corresponding author of this paper

WANG Zi-Xu　Master student at the School of Measuring and Optical Engineering, Nanchang Hangkong University. His main research interest is computer vision

SHU Ming-Yi　Master student at the School of Measuring and Optical Engineering, Nanchang Hangkong University. His main research interest is computer vision

摘要

摘要: 针对现有深度学习光流估计模型在大位移场景下的准确性和鲁棒性问题, 提出了一种联合深度超参数卷积和交叉关联注意力的图像序列光流估计方法. 首先, 通过联合深层卷积和标准卷积构建深度超参数卷积以替代普通卷积, 提取更多特征并加快光流估计网络训练的收敛速度, 在不增加网络推理量的前提下提高光流估计的准确性; 然后, 设计基于交叉关联注意力的特征提取编码网络, 通过叠加注意力层数获得更大的感受野, 以提取多尺度长距离上下文特征信息, 增强大位移场景下光流估计的鲁棒性; 最后, 采用金字塔残差迭代模型构建联合深度超参数卷积和交叉关联注意力的光流估计网络, 提升光流估计的整体性能. 分别采用MPI-Sintel和KITTI测试图像集对本文方法和现有代表性光流估计方法进行综合对比分析, 实验结果表明本文方法取得了较好的光流估计性能, 尤其在大位移场景下具有更好的估计准确性与鲁棒性.
- 光流 /
- 大位移 /
- 交叉关联注意力 /
- 深度超参数卷积 /
- 深度学习
Abstract: To improve the computation accuracy and robustness of deep-learning based optical flow models under large displacement scenes, we propose an optical flow estimation method jointing depthwise over-parameterized convolution and cross correlation attention. First, we construct a depthwise over-parameterized convolution model by combining the common convolution and depthwise convolution, which extracts more features and accelerates the convergence speed of optical flow network. This improves the optical flow accuracy without increasing computation complexity. Second, we exploit a feature extraction encoder based on cross correlation attention network, which extracts multi-scale long distance context feature information by stack the attention layers to obtain a larger receptive field. This improves the robustness of optical flow estimation under large displacement scenes. Finally, a pyramid residual iteration network by combing cross correlation attention and depthwise over-parameterized convolution is presented to improve the overall performance of optical flow estimation. We compare our method with the existing representative approaches by using the MPI-Sintel and KITTI datasets. The experimental results demonstrate that the proposed method shows better optical flow estimation performance, especially achieves better computation accuracy and robustness under large displacement areas.
- Optical flow /
- large displacement /
- cross correlation attention /
- depthwise over-parameterized convolution /
- deep learning

HTML全文

图 1 基于深度超参数卷积和交叉关联注意力的大位移光流估计网络示意图

Fig. 1 Structure diagram of large displacement optical flow estimation based on depthwise over-parameterized convolution and cross correlation attention

下载: 全尺寸图片幻灯片

图 2 深度超参数卷积和标准卷积示意图

Fig. 2 The structure diagram of conventional convolution and depthwise over-parameterized convolution

下载: 全尺寸图片幻灯片

图 3 深度超参数卷积操作

Fig. 3 The operation of depthwise over-parameterized convolution

下载: 全尺寸图片幻灯片

图 4 不同光流模型特征图对比

Fig. 4 Comparison of feature maps of different optical flow models

下载: 全尺寸图片幻灯片

图 5 交叉关联注意力模块

Fig. 5 The cross correlation attention block

下载: 全尺寸图片幻灯片

图 6 基于交叉关联注意力的光流特征编码网络示意图

Fig. 6 Structure diagram of optical flow feature encoder network based on cross correlation attention

下载: 全尺寸图片幻灯片

图 7 不同光流模型估计结果对比

Fig. 7 Comparison of results of different optical flow models

下载: 全尺寸图片幻灯片

图 8 Clean和Final数据集不同序列特征图可视化 (其中红框区域内为存在明显区别的边缘特征信息结果)

Fig. 8 Visualization of feature maps of different sequence in Clean and Final datasets (The red bounding box contains edge feature information results with significant differences)

下载: 全尺寸图片幻灯片

图 9 金字塔不同层数下不同尺度目标特征可视化

Fig. 9 Visualization of feature maps at different scales under different layers of pyramid

下载: 全尺寸图片幻灯片

图 10 MPI-Sintel测试集图像序列对比方法光流估计可视化结果

Fig. 10 Visualization results of flow field results of the comparable methods on MPI-Sintel test datasets

下载: 全尺寸图片幻灯片

图 11 KITTI2015测试集图像序列对比方法光流估计误差可视化结果

Fig. 11 Flow error maps of the comparable methods tested on KITTI2015 datasets

下载: 全尺寸图片幻灯片

图 12 Baseline_deconv在各数据集训练过程

Fig. 12 The training process of Baseline_deconv on each dataset

下载: 全尺寸图片幻灯片

图 13 消融模型光流估计结果在MPI-Sintel测试数据集可视化对比

Fig. 13 Comparison of visualization results of each ablation model on MPI-Sintel test datasets

下载: 全尺寸图片幻灯片

图 14 消融模型光流估计结果在KITTI2015测试数据集可视化对比

Fig. 14 Comparison of visualization results of each ablation model on KITTI2015 datasets

下载: 全尺寸图片幻灯片

表 1 MPI-Sintel数据集图像序列光流估计结果 (pixels)

Table 1 Optical flow calculation results of image sequences in MPI-Sintel dataset (pixels)

对比方法	Clean			Final
对比方法	All	Matched	Unmatched	All	Matched	Unmatched
IRR-PWC^[14]	3.844	1.472	23.220	4.579	2.154	24.355
PPAC-HD3^[36]	4.589	1.507	29.751	4.599	2.116	24.852
LiteFlowNet2^[37]	3.483	1.383	20.637	4.686	2.248	24.571
IOFPL-ft^[38]	4.394	1.611	27.128	4.224	1.956	22.704
PWC-Net^[25]	4.386	1.719	26.166	5.042	2.445	26.221
HMFlow^[39]	3.206	1.122	20.210	5.038	2.404	26.535
SegFlow153^[40]	4.151	1.246	27.855	6.191	2.940	32.682
SAMFL^[41]	4.477	1.763	26.643	4.765	2.282	25.008
本文方法	2.763	1.062	16.656	4.202	2.056	21.696

下载: 导出CSV

表 2 MPI-Sintel数据集运动边缘与大位移指标对比结果 (pixels)

Table 2 Comparison results of motion edge and large displacement index in MPI-Sintel dataset (pixels)

对比方法	Clean					Final
对比方法	${d}_{0\text{-}10}$	${d}_{10\text{-}60}$	${d}_{60\text{-}140}$	${s}_{0\text{-}10}$	${s}_{10\text{-}40}$	${s}_{40+}$	${d}_{0\text{-}10}$	${d}_{10\text{-}60}$	${d}_{60\text{-}140}$	${s}_{0\text{-}10}$	${s}_{10\text{-}40}$	${s}_{40+}$
IRR-PWC^[14]	3.509	1.296	0.721	0.535	1.724	25.430	4.165	1.843	1.292	0.709	2.423	28.998
PPAC-HD3^[36]	2.788	1.340	1.068	0.355	1.289	33.624	3.521	1.702	1.637	0.617	2.083	30.457
LiteFlowNet2^[37]	3.293	1.263	0.629	0.597	1.772	21.976	4.048	1.899	1.473	0.811	2.433	29.375
IOFPL-ft^[38]	3.059	1.421	0.943	0.391	1.292	31.812	3.288	1.479	1.419	0.646	1.897	27.596
PWC-Net^[25]	4.282	1.657	0.674	0.606	2.070	28.793	4.636	2.087	1.475	0.799	2.986	31.070
HMFlow^[39]	2.786	0.957	0.584	0.467	1.693	20.470	4.582	2.213	1.465	0.926	3.170	29.974
SegFlow153^[40]	3.072	1.143	0.656	0.486	2.000	27.563	4.969	2.492	2.119	1.201	3.865	36.570
SAMFL^[41]	3.946	1.623	0.811	0.618	1.860	29.995	4.208	1.846	1.449	0.893	2.587	29.232
本文方法	2.772	0.854	0.443	0.541	1.621	16.575	3.884	1.660	1.292	0.753	2.381	25.715

下载: 导出CSV

表 3 KITTI2015数据集计算结果 (%)

Table 3 Calculation results in KITTI2015 dataset (%)

对比方法	$Fl\text{-}bg $	$Fl\text{-}fg $	$Fl\text{-}all $
IRR-PWC^[14]	7.68	7.52	7.65
PPAC-HD3^[36]	5.78	7.48	6.06
LiteFlowNet2^[37]	7.62	7.64	7.62
IOFPL-ft^[38]	—	—	6.52
PWC-Net^[25]	9.66	9.31	9.60
SegFlow153^[40]	22.21	23.72	22.46
SAMFL^[41]	7.72	7.43	7.68
本文方法	7.43	6.65	7.30

下载: 导出CSV

表 4 MPI-Sintel数据集上消融实验结果对比 (pixels)

Table 4 Comparison of ablation experiment results in MPI-Sintel dataset (pixels)

消融模型	All	Matched	Unmatched	$s_{10\text{-}40}$	$s_{40+}$
Baseline	3.844	1.472	23.220	1.724	25.430
Baseline_CS	2.892	1.070	17.765	1.662	17.460
Baseline_deconv	3.621	1.461	21.272	1.659	23.482
Full model	2.763	1.062	16.656	1.621	16.575

下载: 导出CSV

表 5 KITTI2015数据集上消融实验结果对比

Table 5 Comparison of ablation experiment results in KITTI2015 dataset

消融模型	$Fl\text{-}bg $ (%)	$Fl\text{-}fg $ (%)	$Fl\text{-}all $ (%)	训练时间(min)
Baseline	7.68	7.52	7.65	621
Baseline_CS	7.74	7.58	7.71	690
Baseline_deconv	7.28	7.30	7.29	632
Full model	7.43	6.65	7.30	616

下载: 导出CSV

参考文献(42)

[1]	张骄阳, 丛爽, 匡森. n比特随机量子系统实时状态估计及其反馈控制. 自动化学报, 2024, 50(1): 42−53 Zhang Jiao-Yang, Cong Shuang, Kuang Sen. Real-time state estimation and feedback control for n-qubit stochastic quantum systems. Acta Automatica Sinica, 2024, 50(1): 42−53
[2]	张伟, 黄卫民. 基于种群分区的多策略自适应多目标粒子群算法. 自动化学报, 2022, 48(10): 2585−2599 doi: 10.16383/j.aas.c200307 Zhang Wei, Huang Wei-Min. Multi-strategy adaptive multi-objective particle swarm optimization algorithm based on swarm partition. Acta Automatica Sinica, 2022, 48(10): 2585−2599 doi: 10.16383/j.aas.c200307
[3]	张芳, 赵东旭, 肖志涛, 耿磊, 吴骏, 刘彦北. 单幅图像超分辨率重建技术研究进展. 自动化学报, 2022, 48(11): 2634−2654 doi: 10.16383/j.aas.c200777 Zhang Fang, Zhao Dong-Xu, Xiao Zhi-Tao, Geng Lei, Wu Jun, Liu Yan-Bei. Research progress of single image super-resolution reconstruction technology. Acta Automatica Sinica, 2022, 48(11): 2634−2654 doi: 10.16383/j.aas.c200777
[4]	杨天金, 侯振杰, 李兴, 梁久祯, 宦娟, 郑纪翔. 多聚点子空间下的时空信息融合及其在行为识别中的应用. 自动化学报, 2022, 48(11): 2823−2835 doi: 10.16383/j.aas.c190327 Yang Tian-Jin, Hou Zhen-Jie, Li Xing, Liang Jiu-Zhen, Huan Juan, Zheng Ji-Xiang. Recognizing action using multi-center subspace learning-based spatial-temporal information fusion. Acta Automatica Sinica, 2022, 48(11): 2823−2835 doi: 10.16383/j.aas.c190327
[5]	闫梦凯, 钱建军, 杨健. 弱对齐的跨光谱人脸检测. 自动化学报, 2023, 49(1): 135−147 doi: 10.16383/j.aas.c210058 Yan Meng-Kai, Qian Jian-Jun, Yang Jian. Weakly aligned cross-spectral face detection. Acta Automatica Sinica, 2023, 49(1): 135−147 doi: 10.16383/j.aas.c210058
[6]	郭迎春, 冯放, 阎刚, 郝小可. 基于自适应融合网络的跨域行人重识别方法. 自动化学报, 2022, 48(11): 2744−2756 doi: 10.16383/j.aas.c220083 Guo Ying-Chun, Feng Fang, Yan Gang, Hao Xiao-Ke. Cross-domain person re-identification on adaptive fusion network. Acta Automatica Sinica, 2022, 48(11): 2744−2756 doi: 10.16383/j.aas.c220083
[7]	Horn B K P, Schunck B G. Determining optical flow. Artificial Intelligence, 1981, 17(1−3): 185−203 doi: 10.1016/0004-3702(81)90024-2
[8]	Sun D Q, Roth S, Black M J. Secrets of optical flow estimation and their principles. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). San Francisco, USA: IEEE, 2010. 2432−2439
[9]	Menze M, Heipke C, Geiger A. Discrete optimization for optical flow. In: Proceedings of the 37th German Conference Pattern Recognition (GCPR). Aachen, Germany: Springer, 2015. 16−28
[10]	Chen Q F, Koltun V. Full flow: Optical flow estimation by global optimization over regular grids. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 4706−4714
[11]	Dosovitskiy A, Fischer P, Ilg E, Häusser P, Hazirbas C, Golkov V. FlowNet: Learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 2758−2766
[12]	Ranjan A, Black M J. Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 2720−2729
[13]	Amiaz T, Lubetzky E, Kiryati N. Coarse to over-fine optical flow estimation. Pattern Recognition, 2007, 40(9): 2496−2503 doi: 10.1016/j.patcog.2006.09.011
[14]	Hur J, Roth S. Iterative residual refinement for joint optical flow and occlusion estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 5754−5763
[15]	Tu Z G, Xie W, Zhang D J, Poppe R, Veltkamp R C, Li B X, et al. A survey of variational and CNN-based optical flow techniques. Signal Processing: Image Communication, 2019, 72: 9−24 doi: 10.1016/j.image.2018.12.002
[16]	Zhang C X, Ge L Y, Chen Z, Li M, Liu W, Chen H. Refined TV-L₁ optical flow estimation using joint filtering. IEEE Transactions on Multimedia, 2020, 22(2): 349−364 doi: 10.1109/TMM.2019.2929934
[17]	Dalca A V, Rakic M, Guttag J, Sabuncu M R. Learning conditional deformable templates with convolutional networks. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc., 2019. Article No. 32
[18]	Chen J, Lai J H, Cai Z M, Xie X H, Pan Z G. Optical flow estimation based on the frequency-domain regularization. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(1): 217−230 doi: 10.1109/TCSVT.2020.2974490
[19]	Zhai M L, Xiang X Z, Lv N, Kong X D. Optical flow and scene flow estimation: A survey. Pattern Recognition, 2021, 114: Article No. 107861 doi: 10.1016/j.patcog.2021.107861
[20]	Zach C, Pock T, Bischof H. A duality based approach for realtime TV-L₁ optical flow. In: Proceedings of the 29th DAGM Symposium on Pattern Recognition. Heidelberg, Germany: Springer, 2007. 214−223
[21]	Zhao S Y, Zhao L, Zhang Z X, Zhou E Y, Metaxas D. Global matching with overlapping attention for optical flow estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022. 17571−17580
[22]	Li Z W, Liu F, Yang W J, Peng S H, Zhou J. A survey of convolutional neural networks: Analysis, applications, and prospects. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(12): 6999−7019 doi: 10.1109/TNNLS.2021.3084827
[23]	Han J W, Yao X W, Cheng G, Feng X X, Xu D. P-CNN: Part-based convolutional neural networks for fine-grained visual categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(2): 579−590 doi: 10.1109/TPAMI.2019.2933510
[24]	Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 1647−1655
[25]	Sun D Q, Yang X D, Liu M Y, Kautz J. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, USA: IEEE, 2018. 8934−8943
[26]	Wang Z G, Chen Z, Zhang C X, Zhou Z K, Chen H. LCIF-Net: Local criss-cross attention based optical flow method using multi-scale image features and feature pyramid. Signal Processing: Image Communication, 2023, 112: Article No. 116921 doi: 10.1016/j.image.2023.116921
[27]	Teed Z, Deng J. RAFT: Recurrent all-pairs field transforms for optical flow. In: Proceedings of the 16th European Conference on Computer Vision (ECCV). Glasgow, UK: Springer, 2020. 402−419
[28]	Han K, Xiao A, Wu E H, Guo J Y, Xu C J, Wang Y H. Transformer in transformer. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. Montreal, Canada: NIPS, 2021.15908−15919
[29]	Jiang S H, Campbell D, Lu Y, Li H D, Hartley R. Learning to estimate hidden motions with global motion aggregation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: 2021. 9752−9761
[30]	Xu H F, Zhang J, Cai J F, Rezatofighi H, Tao D C. GMFlow: Learning optical flow via global matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022. 8111−8120
[31]	Cao J M, Li Y Y, Sun M C, Chen Y, Lischinski D, Cohen-Or D, et al. DO-Conv: Depthwise over-parameterized convolutional layer. IEEE Transactions on Image Processing, 2022, 31: 3726−3736 doi: 10.1109/TIP.2022.3175432
[32]	Dong X Y, Bao J M, Chen D D, Zhang W M, Yu N H, Yuan L, et al. CSWin transformer: A general vision transformer backbone with cross-shaped windows. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022. 12114−12124
[33]	Huang Z L, Wang X G, Huang L C, Huang C, Wei Y C, Liu W Y. CCNet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE, 2019. 603−612
[34]	Butler D J, Wulff J, Stanley G B, Black M J. A naturalistic open source movie for optical flow evaluation. In: Proceedings of the 12th European Conference on Computer Vision (ECCV). Florence, Italy: Springer, 2012. 611−625
[35]	Menze M, Geiger A. Object scene flow for autonomous vehicles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE, 2015. 3061−3070
[36]	Wannenwetsch A S, Roth S. Probabilistic pixel-adaptive refinement networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020. 11639−11648
[37]	Hui T W, Tang X O, Loy C C. A lightweight optical flow CNN——Revisiting data fidelity and regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(8): 2555−2569 doi: 10.1109/TPAMI.2020.2976928
[38]	Hofinger M, Bulò S R, Porzi L, Knapitsch A, Pock T, Kontschieder P. Improving optical flow on a pyramid level. In: Proceedings of the 16th European Conference on Computer Vision (ECCV). Glasgow, UK: Springer, 2020. 770−786
[39]	Yu S H J, Zhang Y M, Wang C, Bai X, Zhang L, Hancock E R. HMFlow: Hybrid matching optical flow network for small and fast-moving objects. In: Proceedings of the 25th International Conference on Pattern Recognition (ICPR). Milan, Italy: IEEE, 2021. 1197−1204
[40]	Chen J, Cai Z M, Lai J H, Xie X H. Efficient segmentation-based PatchMatch for large displacement optical flow estimation. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29(12): 3595−3607 doi: 10.1109/TCSVT.2018.2885246
[41]	Zhang C X, Zhou Z K, Chen Z, Hu W M, Li M, Jiang S F. Self-attention-based multiscale feature learning optical flow with occlusion feature map prediction. IEEE Transactions on Multimedia, 2022, 24: 3340−3354 doi: 10.1109/TMM.2021.3096083
[42]	Lu Z H, Xie H T, Liu C B, Zhang Y D. Bridging the gap between vision transformers and convolutional neural networks on small datasets. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans, USA: 2022. 14663−14677