基于多尺度变形卷积的特征金字塔光流计算方法

范兵兵; 葛利跃; 张聪炫; 李兵; 冯诚; 陈震

doi:10.16383/j.aas.c220142

基于多尺度变形卷积的特征金字塔光流计算方法

doi: 10.16383/j.aas.c220142 cstr: 32138.14.j.aas.c220142

范兵兵^{1, 2,},
葛利跃^{1, 2,},
张聪炫^{1, 2,},
李兵^3,,
冯诚^4,,
陈震^{1, 2,}

1.
南昌航空大学测试与光电工程学院南昌 330063
2.
南昌航空大学江西省图像处理与模式识别重点实验室南昌 330063
3.
中国科学院自动化研究所模式识别国家重点实验室北京 100190
4.
北京航空航天大学仪器科学与光电工程学院北京 100083

基金项目: 科技创新2030“新一代人工智能” 重大项目(2020AAA0105802, 2020AAA0105801, 2020AAA0105800), 国家重点研发计划 (2020 YFC2003800), 国家自然科学基金 (62222206, 62272209, 61866026, 61772255, 61866025), 江西省技术创新引导类计划项目(20212AEI 91005), 江西省教育厅科学技术项目 (GJJ210910), 江西省优势科技创新团队 (20165BCB19007), 江西省自然科学基金重点项目 (20202ACB214007), 江西省图像处理与模式识别重点实验室开放基金 (ET202104413) 资助

详细信息

作者简介:
范兵兵：南昌航空大学测试与光电工程学院硕士研究生. 主要研究方向为计算机视觉. E-mail: 1908080400123@stu.nchu.edu.cn

葛利跃：南昌航空大学助理实验师. 主要研究方向为图像检测与智能识别. E-mail: lygeah@163.com

张聪炫：南昌航空大学测试与光电工程学院教授. 2014年获得南京航空航天大学博士学位. 主要研究方向为图像处理, 计算机视觉. 本文通信作者. E-mail: zcxdsg@163.com

李兵：中国科学院自动化研究所模式识别国家重点实验室研究员. 2009年获得北京交通大学博士学位. 主要研究方向为视频内容理解, 多媒体内容安全. E-mail: bli@nlpr.ia.ac.cn

冯诚：北京航空航天大学仪器科学与光电工程学院博士研究生. 主要研究方向为图像处理, 计算机视觉. E-mail: fengcheng00016@163.com

陈震：南昌航空大学测试与光电工程学院教授. 2003年获得西北工业大学博士学位. 主要研究方向为图像处理, 计算机视觉. E-mail: dr_chenzhen@163.com

计量
- 文章访问数: 2436
- HTML全文浏览量: 504
- PDF下载量: 251
- 被引次数: 0
出版历程
- 收稿日期: 2022-03-05
- 录用日期: 2022-07-22
- 网络出版日期: 2022-09-22
- 刊出日期: 2023-01-07

A Feature Pyramid Optical Flow Estimation Method Based on Multi-scale Deformable Convolution

FAN Bing-Bing^{1, 2
,},
GE Li-Yue^{1, 2
,},
ZHANG Cong-Xuan^{1, 2
,},
LI Bing^3
,,
FENG Cheng^4
,,
CHEN Zhen^{1, 2
,}

1.
School of Measuring and Optical Engineering, Nanchang Hangkong University, Nanchang 330063
2.
Key Laboratory of Jiangxi Province for Image Processing and Pattern Recognition, Nanchang Hangkong University, Nanchang 330063
3.
National Key Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190
4.
School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing 100083

Funds: Supported by Science and Technology Innovation 2030 “New Generation Artificial Intelligence” Major Program (2020AAA0 105802, 2020AAA0105801, 2020AAA0105800), National Key Research and Development Program of China (2020YFC2003800), National Natural Science Foundation of China (62222206, 622722 09, 61866026, 61772255, 61866025), the Technological Innovation Guidance Program of Jiangxi Province (20212AEI91005), Science and Technology Program of Education Department of Jiangxi Province (GJJ210910), Advantage Subject Team of Ji-angxi Province (20165BCB19007), Natural Science Foundation of Jiangxi Province (20202ACB214007), and the Open Fund of Jiangxi Key Laboratory for Image Processing and Pattern Recognition (ET202104413)

More Information

Author Bio:
FAN Bing-Bing　Master student at the School of Measuring and Optical Engineering, Nanchang Hangkong University. His main research interest is computer vision

GE Li-Yue　Assistant experimenter at Nanchang Hangkong University. His research interest covers image detection and intelligent recognition

ZHANG Cong-Xuan　Professor at the School of Measuring and Optical Engineering, Nanchang Hangkong University. He received his Ph.D. degree from Nanjing University of Aeronautics and Astronautics in 2014. His research interest covers image processing and computer vision. Corresponding author of this paper

LI Bing　Researcher at the National Key Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. He received his Ph.D. degree from Beijing Jiaotong University in 2009. His research interest covers video understanding and multimedia content security

FENG Cheng　Ph.D. candidate at the School of Instrumentation and Optoelectronic Engineering, Beihang University. His research interest covers image processing and computer vision

CHEN Zhen　Professor at the School of Measuring and Optical Engineering, Nanchang Hangkong University. He received his Ph.D. degree from Northwestern Polytechnical University in 2003. His research interest covers image processing and computer vision

摘要

摘要: 针对现有深度学习光流计算方法的运动边缘模糊问题, 提出了一种基于多尺度变形卷积的特征金字塔光流计算方法. 首先, 构造基于多尺度变形卷积的特征提取模型, 显著提高图像边缘区域特征提取的准确性; 然后, 将多尺度变形卷积特征提取模型与特征金字塔光流计算网络耦合, 提出一种基于多尺度变形卷积的特征金字塔光流计算模型; 最后, 设计一种结合图像与运动边缘约束的混合损失函数, 通过指导模型学习更加精准的边缘信息, 克服了光流计算运动边缘模糊问题. 分别采用 MPI-Sintel 和 KITTI2015 测试图像集对该方法与代表性的深度学习光流计算方法进行综合对比分析. 实验结果表明, 该方法具有更高的光流计算精度, 有效解决了光流计算的边缘模糊问题.
- 光流 /
- 深度学习 /
- 变形卷积 /
- 特征金字塔 /
- 边缘保护
Abstract: To cope with the issue of edge-blurring caused by the existing deep-learning based optical flow estimation methods, this paper proposes a feature pyramid optical flow estimation method based on multi-scale deformation convolution. Firstly, a feature extraction module based on multi-scale deformable convolution is constructed to improve the accuracy of feature extraction in the regions of image edges. Secondly, by coupling the multi-scale deformable convolution feature extraction module with the feature pyramid based optical flow estimation network, a feature pyramid optical flow estimation model based on multi-scale deformable convolution is presented. Thirdly, a hybrid loss function combining the constraints of image and motion edges is designed, which addresses the issue of edge-blurring by guiding the optical flow model to learn more accurate edge information. Finally, the MPI-Sintel and KITTI2015 test datasets are used for conducting a comprehensive comparison between the proposed method and some representative deep-learning based optical flow estimation methods. The experimental results indicate that the proposed method achieves higher computational accuracy, and overcomes the issue of edge-blurring in optical flow estimation.
- Optical flow /
- deep-learning /
- deformable convolution /
- feature pyramid /
- edge-preserving

HTML全文

图 1 标准卷积与变形卷积图像特征提取示意图与对应模型光流计算结果

Fig. 1 Schematic diagram of standard convolution and deformed convolution image feature extraction and corresponding model optical flow estimation results

下载: 全尺寸图片幻灯片

图 2 多尺度变形卷积特征提取网络结构示意图

Fig. 2 Schematic diagram of multi-scale deformed convolution feature extraction network structure

下载: 全尺寸图片幻灯片

图 3 本文方法特征提取与标准卷积特征提取结果可视化对比

Fig. 3 Visual comparison of feature extraction results of the method in this paper and standard convolution feature extraction results

下载: 全尺寸图片幻灯片

图 4 基于多尺度变形卷积的特征金字塔光流计算网络模型

Fig. 4 Feature pyramid optical flow computing network model based on multi-scale deformed convolution

下载: 全尺寸图片幻灯片

图 5 不同层数多尺度变形卷积模型光流计算结果对比

Fig. 5 Comparison of optical flow calculation results of multi-scale deformed convolution models with different layers

下载: 全尺寸图片幻灯片

图 6 不同损失函数训练模型

Fig. 6 Training models with different loss functions

下载: 全尺寸图片幻灯片

图 7 不同损失函数训练模型后的光流计算结果可视化对比

Fig. 7 Visual comparison of optical flow calculation results after training models with different loss functions

下载: 全尺寸图片幻灯片

图 8 MPI-Sintel 数据集光流结果可视化对比图

Fig. 8 Visualization comparison of optical flow results in MPI-Sintel dataset

下载: 全尺寸图片幻灯片

图 9 KITTI2015 数据集光流误差结果对比图

Fig. 9 Comparison of optical flow error results in KITTI2015 dataset

下载: 全尺寸图片幻灯片

图 10 各消融模型光流计算结果可视化对比图,第2、4行为标签区域放大图

Fig. 10 Visual comparison of optical flow calculation results for each ablation model, the second and fourth rows are enlarged images of the label area

下载: 全尺寸图片幻灯片

表 1 MPI-Sintel数据集图像序列光流计算结果

Table 1 Optical flow calculation results of image sequences in MPI-Sintel dataset

对比方法	Clean			Final
对比方法	All	Matched	Unmatched	All	Matched	Unmatched
FlowNet 2.0^[15]	4.16	1.56	25.40	5.74	2.75	30.11
PWC-Net^[16]	4.39	1.72	26.17	5.04	2.45	26.22
IRR-PWC_RVC^[19]	3.79	2.04	18.04	4.80	2.77	21.34
FDFlowNet^[31]	3.71	1.54	21.38	5.11	2.52	26.23
FastFlowNet^[25]	4.89	1.79	30.18	6.08	2.94	31.69
Semantic_Lattice^[28]	3.84	1.70	21.30	4.89	2.46	24.70
OAS-Net^[29]	3.65	1.49	21.32	5.04	2.46	25.86
本文方法	3.43	1.31	20.79	4.78	2.32	24.77

下载: 导出CSV

表 2 MPI-Sintel 数据集运动边缘与大位移指标对比结果

Table 2 Comparison results of motion edge and large displacement index in MPI-Sintel dataset

对比方法	Clean						Final
对比方法	$ {\rm{d}}_{0-10} $	$ {\rm{d}}_{10-60} $	$ {\rm{d}}_{60-140} $	$ {\rm{s}}_{0-10} $	$ {\rm{s}}_{10-40} $	$ {\rm{s}}_{40+} $	$ {\rm{d}}_{0-10} $	$ {\rm{d}}_{10-60} $	$ {\rm{d}}_{60-140} $	$ {\rm{s}}_{0-10} $	$ {\rm{s}}_{10-40} $	$ {\rm{s}}_{40+} $
FlowNet 2.0^[15]	3.09	1.32	0.92	0.64	1.90	25.42	5.14	2.79	2.10	1.24	4.03	34.51
PWC-Net^[16]	4.28	1.66	0.67	0.61	2.07	28.79	4.64	2.09	1.48	0.80	2.99	31.07
IRR-PWC_RVC^[19]	4.05	1.70	1.04	0.68	2.11	23.23	5.06	2.55	1.66	0.81	3.20	28.45
FDFlowNet^[31]	3.81	1.42	0.69	0.84	2.20	21.63	4.67	2.17	1.64	1.03	3.12	30.16
FastFlowNet^[25]	4.25	1.64	0.91	0.81	2.36	31.24	5.20	2.56	2.04	1.07	3.41	37.44
Semantic_Lattice^[28]	3.86	1.43	0.80	0.60	2.00	24.40	4.60	2.08	1.53	0.80	3.02	29.65
OAS-Net^[29]	3.81	1.39	0.59	0.75	2.13	21.78	4.54	2.05	1.57	0.88	2.91	30.63
本文方法	3.15	1.15	0.59	0.64	1.78	21.33	4.13	1.87	1.59	0.85	2.60	29.51

下载: 导出CSV

表 3 KITTI2015数据集计算结果

Table 3 Calculation results in KITTI2015 dataset

对比方法	Fl-bg	Fl-fg	Fl-all	time (s)
FlowNet 2.0^[15]	10.75%	8.75%	10.41%	0.12
PWC-Net^[16]	9.66%	9.31%	9.60%	0.07
IRR-PWC_RVC^[19]	7.61%	12.22%	8.38%	0.18
LiteFlowNet^[26]	9.66%	7.99%	9.38%	0.09
FlowNet3^[27]	9.82%	10.91%	10.00%	0.09
LSM_RVC^[30]	7.33%	13.06%	8.28%	0.25
FDFlowNet^[31]	9.31%	9.71%	9.38%	0.05
本文方法	7.25%	10.06%	7.72%	0.13

下载: 导出CSV

表 4 MPI-Sintel数据集上消融实验结果对比

Table 4 Comparison of ablation experiment results in MPI-Sintel dataset

消融模型	All	Matched	Unmatched	${\rm{d}}_{0-10}$	${\rm{d}}_{10-60}$	${\rm{d}}_{60-140}$
baseline	4.39	1.72	26.17	4.28	1.66	0.67
baseline_loss	4.03	1.63	23.76	3.17	1.25	0.97
baseline_md	4.19	1.69	24.58	3.32	1.35	0.98
full model	3.43	1.31	20.79	3.15	1.15	0.59

下载: 导出CSV

参考文献(31)

[1]	付婧祎, 余磊, 杨文, 卢昕. 基于事件相机的连续光流估计. 自动化学报, DOI: 10.16383/j.aas.c210242 Fu Jing-Yi, Yu Lei, Yang Wen, Lu Xin. Event-based continuous optical flow estimation. Acta Automatica Sinica, DOI: 10.16383/j.aas.c210242
[2]	Mahapatra D, Ge Z Y. Training data independent image registration using generative adversarial networks and domain adaptation. Pattern Recognition, 2020, 100: Article No. 107109
[3]	张学森, 贾静平. 基于三维卷积神经网络和峰值帧光流的微表情识别算法. 模式识别与人工智能, 2021, 34(5): 423-433 doi: 10.16451/j.cnki.issn1003-6059.202105005 Zhang X S, Jia J P, Cheng Y H, Wang X S. Micro-expression recognition algorithm based on 3D convolutional neural network and optical flow fields from neighboring frames of apex frame. Pattern Recognition and Artificial Intelligence, 2021, 34(5): 423-433 doi: 10.16451/j.cnki.issn1003-6059.202105005
[4]	冯诚, 张聪炫, 陈震, 李兵, 黎明. 基于光流与多尺度上下文的图像序列运动遮挡检测. 自动化学报, DOI: 10.16383/j.aas.c210324 Feng Cheng, Zhang Cong-Xuan, Chen Zhen, Li Bing, Li Ming. Occlusion detection based on optical flow and multiscale context aggregation. Acta Automatica Sinica, DOI: 10.16383/j.aas.c210324
[5]	Bahraini M S, Zenati A, Aouf N. Autonomous cooperative visual navigation for planetary exploration robots. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Xi＇an, China: IEEE, 2021. 9653−9658
[6]	Zhai M L, Xiang X Z, Lv N, Kong X D. Optical flow and scene flow estimation: A survey. Pattern Recognition, 2021, 114: Article No. 107861
[7]	Rao S N, Wang H Z. Robust optical flow estimation via edge preserving filterin. Signal Processing: Image Communication, 2021, 96: Article No. 116309
[8]	Zhang C X, Chen Z, Wang M R, Li M, Jiang S F. Robust non-local TV-L¹ optical flow estimation with occlusion detection. IEEE Transactions on Image Processing, 2017, 26: 4055-4067 doi: 10.1109/TIP.2017.2712279
[9]	Mei L, Lai J H, Xie X H, Zhu J Y, Chen J. Illumination-invariance optical flow estimation using weighted regularization transform. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(2): 495-508 doi: 10.1109/TCSVT.2019.2890861
[10]	Chen J, Cai Z, Lai J H, Xie X H. Efficient segmentation-based PatchMatch for large displacement optical flow estimation. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29(12): 3595-3607 doi: 10.1109/TCSVT.2018.2885246
[11]	Deng Y, Xiao J M, Zhou S Z, Feng J S. Detail preserving coarse-to-fine matching for stereo matching and optical flow. IEEE Transactions on Image Processing, 2021, 30: 5835-5847 doi: 10.1109/TIP.2021.3088635
[12]	Zhang C X, Ge L Y, Chen Z, Li M, Liu W, Chen H. Refined TV-L¹ optical flow estimation using joint filtering. IEEE Transactions Multimedia, 2020, 22(2): 349-364 doi: 10.1109/TMM.2019.2929934
[13]	Dong C, Wang Z S, Han J M, Xing C D, Tang S F. A non-local propagation filtering scheme for edge-preserving in variational optical flow computation. Signal Processing: Image Communication, 2021, 93: Article No. 116143
[14]	Dosovitskiy A, Fischer P, Ilg E, Hausser P, Hazirbas C, Golkov V. FlowNet: Learning optical flow with convolutional networks. In: Proceedings of the International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 2758−2766
[15]	Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 1647−1655
[16]	Sun D Q, Yang X D, Liu M Y, Jan K. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, USA: IEEE, 2018. 8934−8943
[17]	Yu J J, Harley A W, Derpanis K G. Back to basics: Unsupervised learning of optical flow via brightness constancy and motion smoothness. In: Proceedings of the European Conference on Computer Vision (ECCV). Amsterdam, The Netherlands: Springer, 2016. 3−10
[18]	Liu P P, King I, Lyu M R, Xu J. DDFlow: Learning optical flow with unlabeled data distillation. In: Proceedings of the AAAI Conference on Artificial Intelligence. Phoenix, USA: AAAI, 2019. 2−8
[19]	Hur J, Roth S. Iterative residual refinement for joint optical flow and occlusion estimation. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 5747−5756
[20]	Zhao S Y, Sheng Y L, Dong Y, Chang E I, Xu Y. MaskFlownet: Asymmetric feature matching with learnable occlusion mask. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020. 6277−6286
[21]	Meister S, Hur J, Roth S. UnFlow: Unsupervised learning of optical flow with a bidirectional census loss. In: Proceedings of the AAAI Conference on Artificial Intelligence. New Orleans, USA: AAAI, 2018. 7251−7259
[22]	Zhang C X, Zhou Z Z, Chen Z, Hu W M, Li M, Jiang S F. Self-attention-based multiscale feature learning optical flow with occlusion feature map prediction. IEEE Transactions on Multimedia, 2021: 3340-3354 doi: 10.1109/TMM.2021.3096083, to be published
[23]	Butler D J, Wulff J, Stanley G B, Black M J. A naturalistic open source movie for optical flow evaluation. In: Proceedings of the European Conference on Computer Vision (ECCV). Florence, Italy: Springer, 2012. 611−625
[24]	Menze M, Geiger A. Object scene flow for autonomous vehicles. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE, 2015. 3061−3070
[25]	Kong L T, Shen C H, Yang J. FastFlowNet: A lightweight network for fast optical flow estimation. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Xi＇an, China: IEEE, 2021. 10310−10316
[26]	Hui T W, Tang X O, Loy C C. LiteFlowNet: A lightweight convolutional neural network for optical flow estimation. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, USA: IEEE, 2018. 8981−8989
[27]	Ilg E, Saikia T, Keuper M, Brox T. Occlusions, motion and depth boundaries with a generic network for disparity, optical flow or scene flow estimation. In: Proceedings of the European Conference on Computer Vision (ECCV). Munich, Germany: Springer, 2018. 626−643
[28]	Wannenwetsch A S, Kiefel M, Gehler P V, Roth S. Learning task-specific generalized convolutions in the permutohedral lattice. In: Proceedings of the German Conference on Pattern Recognition (GCPR). Cham, Germany: Springer, 2019. 345−359
[29]	Kong L T, Yang X H, Yang J. OAS-Net: Occlusion aware sampling network for accurate optical flow. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toronto, Canada: IEEE, 2021. 2475−2479
[30]	Tang C Z, Yuan L, Tan P. LSM: Learning subspace minimization for low-level vision. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020. 6234−6245
[31]	Kong L T, Yang J. FDFlowNet: Fast optical flow estimation using a deep lightweight network. In: Proceedings of the IEEE International Conference on Image Processing (ICIP). Anchorage, USA: IEEE, 2020. 1501−1505