2.793

2018影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于元学习的双目深度估计在线适应算法

张振宇 杨健

张振宇, 杨健. 基于元学习的双目深度估计在线适应算法. 自动化学报, 2021, x(x): 1−10 doi: 10.16383/j.aas.c200286
引用本文: 张振宇, 杨健. 基于元学习的双目深度估计在线适应算法. 自动化学报, 2021, x(x): 1−10 doi: 10.16383/j.aas.c200286
Zhang Zhen-Yu, Yang Jian. Online adaptation through meta-learning for stereo depth estimation. Acta Automatica Sinica, 2021, x(x): 1−10 doi: 10.16383/j.aas.c200286
Citation: Zhang Zhen-Yu, Yang Jian. Online adaptation through meta-learning for stereo depth estimation. Acta Automatica Sinica, 2021, x(x): 1−10 doi: 10.16383/j.aas.c200286

基于元学习的双目深度估计在线适应算法

doi: 10.16383/j.aas.c200286
基金项目: 中国自然科学基金(项目编号: U1713208)资助
详细信息
    作者简介:

    张振宇:南京理工大学计算机科学与工程学院PCA Lab博士研究生. 2015年获得南京理工大学理学院信息与计算科学系学士学位. 主要研究方向为基于视觉的深度估计方法, 深度学习算法等. E-mail: zhangjesse@njust.edu.cn

    杨健:南京理工大学计算机科学与工程学院院长, 长江学者, IAPR fellow, 在计算书视觉与模式识别领域以发表论文200余篇, 谷歌学术他引已超过15000余次. 在Patter Recognition, IEEE Trans. on Neural Network and Learning Systems, Neuralcomputing 等期刊长期担任审稿人. E-mail: csjyang@njust.edu.cn

Online adaptation through meta-learning for stereo depth estimation

Funds: Supported by National Natural Science Foundation of China (U1713208)
More Information
    Author Bio:

    ZHANG Zhen-Yu He is currently a PhD candidate of PCA Lab, Nanjing University of Science and Technology, China, supervised by Jian Yang. He received the B.Sc. degree in 2015. His research interests include computer vision and deep learning, specially on depth estimation

    YANG Jian He is a Chang-Jiang professor in the School of Computer Science and Engineering in Nanjing University of Science and Technology. He is the author of more than 200 scientific papers in pattern recognition and computer vision. His papers have been cited more than 15000 times in the Google Scholar. Currently, he is an associate editor of Pattern Recognition, IEEE Trans. on Neural Networks and Learning Systems, and Neuralcomputing. He is a Fellow of IAPR

  • 摘要: 双目深度估计的在线适应是一个有挑战性的问题, 其要求模型能够在不断变化的目标场景中在线连续地自我调整并适应于当前环境. 为处理该问题, 本文提出了一种新的在线元学习适应算法(Online meta-learning model with adaptation, OMLA), 其贡献主要体现在两方面: 首先引入在线特征对齐方法处理目标域和源域特征的分布偏差, 以减少数据域转移的影响, 然后利用在线元学习方法调整特征对齐过程和网络权重, 使模型实现快速收敛.此外, 本文提出了一种新的基于元学习的预训练方法, 以获得适用于在线学习场景的深度网络参数, 相关实验分析表明, OMLA和元学习预训练算法均能帮助模型快速适应于新场景, 在KITTI数据集上的实验对比表明, 本文方法的效果超越了当前最佳的在线适应算法, 接近甚至优于在目标域离线训练的理想模型.
  • 图  1  本文提出的基于元学习的深度估计在线适应算法框架. 详细介绍见第二章.

    Fig.  1  The proposed meta-learning framework for online stereo adaptation. See Section 2 for details.

    图  2  本文提出的在线元学习适应方法. 具体见章节2.1和2.2.

    Fig.  2  The proposed Online Meta-Learning with Adaptation (OMLA) method. See Section 3.1 and 3.2 for details.

    图  3  在KITTI Eigen测试集中三个不同视频序列上的效果. 为了展示模型的在线适应效果随时间的变化,此处展示了视频初始, 中段和末段时刻的深度估计效果.

    Fig.  3  Performance on three different videos of KITTI Eigen test. We illustrated predictions of initial, medium and last frames.

    表  1  Kitti Eigen测试集的算法消融实验, 仅评估50米之内的深度估计效果.

    Table  1  Ablation study on KITTI Eigen test set, the results are evaluated within 50 meters.

    方法预训练方式平均得分最后20%帧的平均得分
    RMSEAbs RelSq Rel$RMSE_{log}$RMSEAbs RelSq Rel$RMSE_{log}$
    基准方法12.20120.43575.56721.359812.28740.44525.52131.3426
    基准方法标准预训练方法9.05180.24993.29010.95039.03090.25123.31040.9495
    仅在线特征分布对齐标准预训练方法3.61350.12500.69720.20413.58570.10310.68870.1910
    OMLA算法标准预训练方法3.50270.09230.66110.18963.39860.08820.65790.1735
    基准方法元预训练方法8.82300.23053.05780.93248.70610.22732.98040.9065
    仅在线特征分布对齐元预训练方法3.50430.09500.66270.19923.48310.08960.65450.1921
    OMLA算法元预训练方法$\bf 3.4051$$\bf 0.0864$${\bf{0.6256}}$$\bf 0.1852$$\bf 3.3803 $$\bf 0.0798$${\bf{0.6176}}$${\bf{0.1801}}$
    下载: 导出CSV

    表  2  不同网络模型和数据库上的结果对比.

    Table  2  Comparison on different network architectures and datasets.

    网络模型方法预训练于Synthia[20]预训练于SceneFlow[41]FPS
    RMSEAbs RelSq Rel$RMSE_{log}$RMSEAbs RelSq Rel$RMSE_{log}$
    ResNet[9]基准方法9.05180.24993.29010.85779.08930.26023.38960.8901$\bf 5.06$
    OMLA+元预训练$\bf 3.4051$$\bf 0.0864$${\bf{0.6256}}$$\bf 0.1852$$\bf 4.0573$$\bf 0.1231$$\bf 1.1532$$\bf 0.1985$3.40
    MADNet[15]基准方法8.86500.26843.15030.82338.98230.27903.30210.8350$\bf 12.05$
    OMLA+元预训练$\bf 4.0236$$\bf 0.1756$$\bf 1.1825$$\bf 0.2501$$\bf 4.2179$$\bf 0.1883$$\bf 1.2761$$\bf 0.2523$9.56
    DispNet[41]基准方法9.02220.27104.32810.94529.15870.28054.35900.9528$\bf 5.42$
    OMLA+元预训练$\bf 4.5201$$\bf 0.2396$$\bf 1.3104$$\bf 0.2503$$\bf 4.6314$$\bf 0.2457$$\bf 1.3541$$\bf 0.2516$4.00
    下载: 导出CSV

    表  3  与理想模型和当前最优方法的比较. 仅比较实际深度值小于50 m的像素点.

    Table  3  Comparison with ideal models and state-of-the-art method. Results are only evaluated within 50 meters.

    网络模型在线适应算法预训练域RMSEAbs RelSq Rel$RMSE_{log}$$\alpha>1.25$$\alpha>1.25^2$$\alpha>1.25^3$
    ResNet[9]目标域3.69750.09831.17200.19230.91660.95800.9778
    基准方法目标域3.4359$\bf 0.0850$0.65470.1856${\bf{0.9203}}$0.96120.9886
    L2A[35]源域3.50300.09130.6522$\bf 0.1840$0.91700.96110.9882
    OMLA+元预训练源域$\bf 3.4051$0.0864$\bf 0.6256 $0.18520.9170$\bf 0.9623$$\bf 0.9901$
    MADNet[15]目标域${\bf{3.8965}}$0.17931.2369${\bf{0.2457}}$0.91470.96010.9790
    基准方法目标域3.90230.17601.19020.2469$\bf 0.9233$0.96520.9813
    L2A[35]源域4.15060.17881.19350.25330.91310.94430.9786
    OMLA+元预训练源域4.0236${\bf{0.1756}}$${\bf{1.1825}}$0.25010.9022$\bf 0.9658$$\bf 0.9842$
    DispNet[41]目标域4.52100.2433${\bf{1.2801}}$${\bf{0.2490}}$0.91260.9472${\bf{0.9730}}$
    基准方法目标域4.5327$\bf 0.2368$1.28530.2506$\bf 0.9178$$\bf 0.9600$0.9725
    L2A[35]源域4.62170.24101.29020.25930.90620.95130.9688
    OMLA+元预训练源域${\bf{4.5201}}$0.23961.31040.25030.90850.94600.9613
    下载: 导出CSV
  • [1] Eigen D, Fergus R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of International Conference on Computer Vision, 2015, 2650−2658.
    [2] Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N, Deeper depth prediction with fully convolutional residual networks. In Proceedings of Internation Conference on 3D Vison, 2016, 239−248.
    [3] Fu H, Gong M, Wang C, Batmanghelich K, Tao D, Deep ordinal regression network for monocular depth estimation, In: Proceedings of IEEE conference on Computer Vison and Pattern Recognition, 2018 2002−2011.
    [4] Xu D, Wang W, Tang H, Liu H, Sebe N, Ricci E, Structured attention guided convolutional neural fields for monocular depth estimation, In: Proceedings of IEEE conference on Computer Vison and Pattern Recognition, 2018, 3917−3925.
    [5] Zhang Z, Cui Z, Xu C, Jie Z, Li X, Yang J, Joint task-recursive learning for semantic segmentation and depth estimation, In Proceedings of European Conference on Computer Vison, 2018, 235−251.
    [6] Wofk D, Ma F, Yang T, Karaman S, Sze V, Fastdepth: Fast monocular depth estimation on embedded systems, In: Proceedings of International Conference on Robotics and Automation, 2019, 6101−6108.
    [7] Ma F, Karaman S, Sparse-to-dense: Depth prediction from sparse depth samples and a single image, In: Proceedings of International Conference on Robotics and Automation, 2018, 1−8.
    [8] Garg R, BG V K, Carneiro G, Reid I, Unsupervised cnn for single view depth estimation: Geometry to the rescue, In Proceedings of European Conference on Computer Vison, 2016, 740−756.
    [9] Godard C, Aodha O, Brostow G J, Unsupervised monocular depth estimation with left-right consistency, In: Proceedings of IEEE conference on Computer Vison and Pattern Recognition, 2017, 270−279.
    [10] Pilzer A, Xu D, Puscas M, Ricci E, and Sebe N, Unsupervised adversarial depth estimation using cycled generative networks, In Proceedings of Internation Conference on 3D Vison, 2018, 587−595.
    [11] Pillai S, Ambruş R, and Gaidon A, Superdepth: Self-supervised, super-resolved monocular depth estimation, In: Proceedings of International Conference on Robotics and Automation, 2019, 9250−9256.
    [12] Mancini M, Karaoguz H, Ricci E, Jensfelt P, Caputo B, Kitting in the wild through online domain adaptation, In: IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018, 1103−1109.
    [13] Carlucci F M, Porzi L, Caputo B, Ricci E, Bulo S R, Autodial: Automatic domain alignment layers. In: Proceedings of International Conference on Computer Vision, 2017, 5077−5085.
    [14] M. Menze and A. Geiger, Object scene flow for autonomous vehicles, In: Proceedings of IEEE conference on Computer Vison and Pattern Recognition, 2015, 3061−3070.
    [15] Tonioni A, Tosi F, Poggi M, Mattoccia S, Stefano D L. Real-time self-adaptive deep stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 195−204.
    [16] Liu F, Shen C, Lin G, Reid I. Learning depth from single monocular images using deep convolutional neural fields. IEEE transactions on pattern analysis and machine intelligence, 2015, 38(10): 2024−2039
    [17] Yang N, Wang R, Stuckler J, Cremers D. Deep virtual stereo odometry: Leveraging deep depth prediction for monocular direct sparse odometry. In: Proceedings of the European Conference on Computer Vision. 2018: 817−833.
    [18] Silberman N, Hoiem D, Kohli P, Fergus R. Indoor segmentation and support inference from rgbd images. In: Proceedings of European conference on computer vision, 2012: 746−760.
    [19] Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? the kitti vision benchmark suite. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2012: 3354−3361.
    [20] Ros G, Sellart L, Materzynska J, Vazquez D, Lopez A M, The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 3234−3243.
    [21] Zhou T, Brown M, Snavely N, Lowe D. Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 1851−1858.
    [22] Nath Kundu J, Krishna Uppala P, Pahuja A, Venkatesh Babu R. Adadepth: Unsupervised content congruent adaptation for depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 2656−2665.
    [23] Csurka G. Domain adaptation for visual applications: A comprehensive survey. In: arXiv preprint arXiv:1702.05374, 2017.
    [24] Long M, Zhu H, Wang J, Jordan I. Deep transfer learning with joint adaptation networks. In: Proceedings of International Conference on Machine Learning, 2017: 2208−2217.
    [25] Venkateswara H, Eusebio J, Chakraborty S, Panchanathan S. Deep hashing network for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 5018−5027.
    [26] Bousmalis K, Irpan A, Wohlhart P, Bai Y, Kelcey M, Kalakrishnan M, et al. Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In: IEEE International Conference on Robotics and Automation. 2018: 4243−4250.
    [27] Sankaranarayanan S, Balaji Y, Castillo C D, Chellappa R. Generate to adapt: Aligning domains using generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 8503−8512.
    [28] Wulfmeier M, Bewley A, Posner I. Incremental adversarial domain adaptation for continually changing environments. In: Proceedings of IEEE International conference on robotics and automation, 2018: 1−9.
    [29] Li Y, Wang N, Shi J, Liu J, Hou X. Revisiting batch normalization for practical domain adaptation, In arXiv preprint arXiv:1603.04779, 2016.
    [30] Mancini M, Porzi L, Rota Bulo S, Caputo B, Ricci E. Boosting domain adaptation by discovering latent domains. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 3771−3780.
    [31] Tobin J, Fong R, Ray A, Schneider J, Zaremba W, Abbeel P. Domain randomization for transferring deep neural networks from simulation to the real world. In: IEEE/RSJ international conference on intelligent robots and systems. 2017: 23−30.
    [32] Tonioni A, Poggi M, Mattoccia S, Stefano L D. Unsupervised adaptation for deep stereo. In: Proceedings of the IEEE International Conference on Computer Vision. 2017: 1605−1613.
    [33] Pang J, Sun W, Yang C, Ren J, Xiao R, Zeng J, Lin L. Zoom and learn: Generalizing deep stereo matching to novel domains. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 2070−2079.
    [34] Zhao S, Fu H, Gong M, Tao D. Geometry-aware symmetric domain adaptation for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 9788−9798.
    [35] Tonioni A, Rahnama O, Joy T, Stefano L D, Ajanthan T, Torr P H. Learning to adapt for stereo, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 9661−9670.
    [36] Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D. Matching networks for one shot learning. Advances in neural information processing systems, 2016: 3630−3638
    [37] Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of International Conference on Machine Learning, 2017: 1126−1135
    [38] Guo Y, Shi H, Kumar A, Rosing T, Feris R. SpotTune: transfer learning through adaptive fine-tuning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 4805−4814.
    [39] Park E, Berg A C. Meta-tracker: Fast and robust online adaptation for visual object trackers. In: Proceedings of the European Conference on Computer Vision. 2018: 569−585.
    [40] P. Wang, X. Shen, Z. Lin, S. Cohen, B. Price, and A. Yuille, Wang P, Shen X, Lin Z, Price B, Yuille A. Towards unified depth and semantic prediction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 2800−2809.
    [41] Mayer N, Ilg E, Hausser P, Fischer P, Cremers D, Dosovitskiy A, Brox T. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 4040−4048.
    [42] Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, et al. Automatic differentiation in pytorch. 2017.
    [43] Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: arXiv preprint arXiv:1502.03167, 2015.
    [44] Kingma D P, Ba J. Adam: A method for stochastic optimization. In: arXiv preprint arXiv:1412.6980, 2014.
  • 加载中
计量
  • 文章访问数:  179
  • HTML全文浏览量:  38
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-05-07
  • 录用日期:  2020-09-14
  • 网络出版日期:  2021-07-05

目录

    /

    返回文章
    返回