-
摘要: 针对非刚性运动和大位移场景下运动遮挡检测的准确性与鲁棒性问题, 本文提出一种基于光流与多尺度上下文的图像序列运动遮挡检测方法. 首先, 设计基于扩张卷积的多尺度上下文信息聚合网络, 通过图像序列多尺度上下文信息获取更大范围的图像特征; 然后, 采用特征金字塔构建基于多尺度上下文与光流的端到端运动遮挡检测网络模型, 利用光流优化非刚性运动和大位移区域的运动遮挡信息; 最后, 构造基于运动边缘的网络模型训练损失函数, 获取准确的运动遮挡边界. 分别采用MPI-Sintel和KITTI测试数据集对本文方法与现有的代表性遮挡检测模型进行实验对比与分析. 实验结果表明, 本文方法能够有效提高运动遮挡检测的准确性, 尤其在非刚性运动和大位移等困难场景下具有更好的遮挡检测鲁棒性.Abstract: In order to improve the accuracy and robustness of occlusion detection under non-rigid motion and large displacements, we propose an occlusion detection model of image sequence based on optical flow and multiscale context. First, we exploit a multiscale context based occlusion detection network which aggregates multiscale context information to obtain a wider range of image features through multi-scale context information of image sequence. Then, we construct an end-to-end motion occlusion detection model based on multi-scale context and optical flow using feature pyramid, utilize the optical flow to optimize the performance of occlusion detection in areas of non-rigid motion and large displacements region. Finally, we present a novel motion edge training loss function to obtain the accurate motion occlusion boundary. We compare our method with the existing representative approaches by using the MPI-Sintel datasets and KITTI datasets. The experimental results show that the proposed method effectively improves the accuracy and robustness of occlusion detection, especially gains the better robustness under non-rigid motion and large displacements.
-
Key words:
- image sequence /
- occlusion detection /
- deep learning /
- multiscale context /
- non-rigid motion
-
表 1 MPI-Sintel数据集平均F1分数对比结果
Table 1 Comparison of Average F1 score on MPI-Sintel dataset
表 2 MPI-Sintel数据集平均漏检与误检率对比结果
Table 2 Comparison of average omission rate and false rate on MPI-Sintel dataset
表 3 非刚性运动与大位移图像序列运动遮挡检测平均F1分数对比结果
Table 3 Comparison of average F1 scores of occlusion detection between non-rigid motion and large-displacement image sequences
对比方法 clean final alley_2 ambush_2 market_6 temple_2 alley_2 ambush_2 market_6 temple_2 Unflow[24] 0.4149 0.4313 0.4330 0.3243 0.4057 0.3920 0.4499 0.3120 Back2Future[25] 0.6816 0.5888 0.6290 0.2712 0.6756 0.5199 0.6239 0.2683 MaskFlownet[27] 0.5057 0.5403 0.4660 0.3838 0.5039 0.4085 0.4735 0.3508 IRR-PWC[26] 0.8709 0.9172 0.8155 0.7404 0.8770 0.7809 0.8023 0.6905 本文方法 0.8811 0.9216 0.8304 0.7747 0.8764 0.7959 0.8106 0.7103 表 4 不同方法的时间消耗对比(加粗为评价最优值)
Table 4 Comparison of time consumption of different methods (bold is the best evaluation value)
表 5 MPI-Sintel全序列平均F1分数对比(加粗为评价最优值)
Table 5 Comparison of average F1 scores of whole image sequence on MPI-Sintel (bold is the best evaluation value)
模型类型 MPI-Sintel training dataset clean final 运行时间 训练时间 全模型 0.75 0.72 0.19 s 13days 去除多尺度上下文网络 0.72 0.68 0.18 s 12days 去除边缘损失函数 0.74 0.71 0.19 s 13days 表 6 MPI-Sintel全序列在不同运动边界区域内的平均F1分数对比(加粗为评价最优值)
Table 6 Comparison of average F1 scores of whole image sequence in different motion boundary regions on MPI-Sintel (bold is the best evaluation value)
模型类型 MPI-Sintel training dataset clean final N=1 N=3 N=5 N=10 N=1 N=3 N=5 N=10 全模型 0.63 0.67 0.69 0.71 0.59 0.62 0.64 0.67 去除多尺度上下文网络 0.59 0.62 0.65 0.67 0.55 0.59 0.61 0.63 去除边缘损失函数 0.60 0.64 0.67 0.69 0.56 0.60 0.62 0.64 -
[1] 张世辉, 何琦, 董利健, 杜雪哲. 基于遮挡区域建模和目标运动估计的动态遮挡规避方法. 自动化学报, 2019, 45(4): 771-786.Zhang Shi-Hui, He Qi, Dong Li-Jian, Du Xue-Zhe. Dynamic occlusion avoidance approach by means of occlusion region model and object motion estimation. Acta Automatica Sinica, 2019, 45(4): 771-786. [2] Yu C, Bo Y, Bo W, Yan W D, Robby T. Occlusion-aware networks for 3D human pose estimation in video. In: Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea: IEEE, 2019.723−732. [3] 张聪炫, 陈震, 熊帆, 黎明, 葛利跃, 陈昊. 非刚性稠密匹配大位移运动光流估计. 电子学报, 2019, 47(6): 1316-1323. doi: 10.3969/j.issn.0372-2112.2019.06.019Zhang Cong-Xuan, Chen Zhen, Xiong Fan, Li Ming, Ge Li-Yue, Chen Hao. Large displacement motion optical flow estimation with non-rigid dense patch matching. Acta Electronica Sinica. 2019, 47(6): 1316-1323. doi: 10.3969/j.issn.0372-2112.2019.06.019 [4] 姚乃明, 郭清沛, 乔逢春, 陈辉, 王宏安. 基于生成式对抗网络的鲁棒人脸表情识别. 自动化学报, 2018, 44(5): 865-877.Yao Nai-Ming, Guo Qing-Pei, Qiao Feng-Chun, Chen Hui, Wang Hong-An. Robust facial expression recognition with generative adversarial networks. Acta Automatica Sinica, 2019, 44(5): 865-877. [5] Pan J Y, Bo H. Robust occlusion handling in object tracking. In: Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, USA: IEEE, 2007.1−8. [6] 刘鑫, 许华荣, 胡占义. 基于 GPU 和 Kinect 的快速物体重建. 自动化学报, 2012, 38(8): 1288-1297.Liu Xin, Xu Hua-Rong, Hu Zhan-Yi. GPU based fast 3D-object modeling with Kinect. 自动化学报, 2012, 38(8): 1288-1297 [7] 张聪炫, 陈震, 黎明. 单目图像序列光流三维重建技术研究综述. 电子学报, 2016, 44(12): 3044-3052. doi: 10.3969/j.issn.0372-2112.2016.12.033Zhang Cong-Xuan, Chen Zhen, Li Ming, Review of the 3D reconstruction technology based on optical flow of monocular image sequence. Acta Electronica Sinica. 2016, 44 (12): 3044-3052. doi: 10.3969/j.issn.0372-2112.2016.12.033 [8] Bailer C, Taetz B, Stricker D. Flow Fields: Dense correspondence fields for highly accurate large displacement optical flow estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1879-1892. doi: 10.1109/TPAMI.2018.2859970 [9] Wolf L, Gadot D. PatchBatch: A batch augmented loss for optical flow. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA: IEEE, 2016.4236−4245. [10] Li Y S, Song R, Hu Y L. Efficient coarse-to-fine patch match for large displacement optical flow. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA: IEEE, 2016.5704−5712. [11] Menze M, Heipke C, Geiger A. Discrete optimization for optical flow. In: Proceedings of the 2015 German Conference on Pattern Recognition (GCPR), Aachen, Germany: Springer Press, 2015.16−28. [12] Chen Q F, Koltun V. Full flow: Optical flow estimation by global optimization over regular grids. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA: IEEE, 2016.4706−4714. [13] Guney F, Geiger A. Deep discrete flow. In: Proceedings of the 2016 Asian Conference on Computer Vision (ACCV), Taipei, Taiwan, China: Springer Press, 2016.207−224. [14] Hur J, Roth S. Joint optical flow and temporally consistent semantic segmentation. In: Proceedings of the 2016 European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands: Springer, 2016.163−177. [15] Ince S, Konrad J. Occlusion-aware optical flow estimation. IEEE Transactions on Image Process, 2008, 17(8): 1443-1451. doi: 10.1109/TIP.2008.925381 [16] Sun D Q, Liu C, Pfister H. Local layering for joint motion estimation and occlusion detection. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, USA: IEEE, 2014.1098−1105. [17] Sun D Q, Sudderth E B, Black M J. Layered image motion with explicit occlusions, temporal consistency, and depth ordering. In: Proceedings of the 24st International Conference on Neural Information Processing Systems (NIPS), Vancouver, Canada: Curran Associates Inc., 2010.2226−2234. [18] Vogel C, Roth S, Schindler K. View-consistent 3D scene flow estimation over multiple frames. In: Proceedings of the 2014 European Conference on Computer Vision (ECCV), Zurich, Switzerland: Springer Press, 2014.263−278. [19] Zanfir A, Sminchisescu C. Large displacement 3D scene flow with occlusion reasoning. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile: IEEE, 2015.4417−4425. [20] Zhang C X, Chen Z, Wang M R, Li M, Jiang S F. Robust non-local TV-L1 optical flow estimation with occlusion detection. IEEE Transactions on Image Process, 2017, 26(8): 4055-4067. doi: 10.1109/TIP.2017.2712279 [21] 张聪炫, 陈震, 汪明润, 黎明, 江少锋. 基于光流与Delaunay三角网格的图像序列运动遮挡检测. 电子学报, 2018, 46(2): 479-485. doi: 10.3969/j.issn.0372-2112.2018.02.030Zhang Cong-Xuan, Chen Zhen, Wang Ming-Run, Li Ming, Jiang Shao-Feng. Motion occlusion detecting from image sequence based on optical flow and Delaunay triangulation. Acta Electronica Sinica, 2018, 46(2): 479-485. doi: 10.3969/j.issn.0372-2112.2018.02.030 [22] Kennedy R, Taylor C J. Optical flow with geometric occlusion estimation and fusion of multiple frames. In: International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR), Hong Kong, China: IEEE, 2015.364−377. [23] Yu J J, Harley A W, Derpanis K G. Back to basics: Unsupervised learning of optical flow via brightness constancy and motion smoothness. In: Proceedings of the 2016 European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands: Springer, 2016.3−10. [24] Meister S, Hur J, Roth S. UnFlow: Unsupervised learning of optical flow with a bidirectional census loss. In: Proceedings of the 31rd AAAI Conference on Artificial Intelligence (AAAI), San Francisco, California, USA: AAAI, 2017.7251−7259. [25] Janai J, Güney F, Ranjan A, Black M, Geiger A. Unsupervised learning of multi-frame optical flow with occlusions. In: Proceedings of the 2018 European Conference on Computer Vision (ECCV), Munich, Germany: Springer, 2018.713−731. [26] Hur J, Roth S. Iterative residual refinement for joint optical flow and occlusion estimation. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA: IEEE, 2019.5747−5756. [27] Zhao S Y, Sheng Y L, Dong Y, Chang E I C, Xu Y. MaskFlownet: Asymmetric feature matching with learnable occlusion mask. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual: IEEE, 2020.6277−6286. [28] Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proceedings of the 2015 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Boston, Massachusetts, USA: IEEE, 2015.1−9. [29] Chen L C, Zhu Y K, Papandreou G, Schroff F, Adam H. Encoder-Decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the 2018 European Conference on Computer Vision (ECCV), Munich, Germany: Springer, 2018.833−851. [30] Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. [Online], available: https://arxiv.org/abs/1511.07122, Apr 30, 2016. [31] Yang M K, Yu K, Zhang C, Li Z W, Yang K Y. DenseASPP for semantic segmentation in street scenes. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, Utah, USA: IEEE, 2018.3684−3692. [32] Mehta S, Rastegari M, Caspi A, Shapiro L, Hajishirzi H. ESPNet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Proceedings of the 2018 European Conference on Computer Vision (ECCV), Munich, Germany: Springer, 2018.561−580. [33] Butler D J, Wulff J, Stanley G B, Black M J. A naturalistic open source movie for optical flow evaluation. In: Proceedings of the 2012 European Conference on Computer Vision (ECCV), Florence, Italy: Springer, 2012.611−625. [34] Menze M, Geiger A. Object scene flow for autonomous vehicles. In: Proceedings of the 2015 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Boston, Massachusetts, USA: IEEE, 2015.2061−3070.
计量
- 文章访问数: 690
- HTML全文浏览量: 393
- 被引次数: 0