2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于元学习的双目深度估计在线适应算法

张振宇 杨健

张振宇, 杨健. 基于元学习的双目深度估计在线适应算法. 自动化学报, 2023, 49(7): 1446−1455 doi: 10.16383/j.aas.c200286
引用本文: 张振宇, 杨健. 基于元学习的双目深度估计在线适应算法. 自动化学报, 2023, 49(7): 1446−1455 doi: 10.16383/j.aas.c200286
Zhang Zhen-Yu, Yang Jian. Online adaptation through meta-learning for stereo depth estimation. Acta Automatica Sinica, 2023, 49(7): 1446−1455 doi: 10.16383/j.aas.c200286
Citation: Zhang Zhen-Yu, Yang Jian. Online adaptation through meta-learning for stereo depth estimation. Acta Automatica Sinica, 2023, 49(7): 1446−1455 doi: 10.16383/j.aas.c200286

基于元学习的双目深度估计在线适应算法

doi: 10.16383/j.aas.c200286
基金项目: 国家自然科学基金(U1713208)资助
详细信息
    作者简介:

    张振宇:南京理工大学计算机科学与工程学院PCA 实验室博士研究生. 2015年获得南京理工大学理学院信息与计算科学系学士学位. 主要研究方向为基于视觉的深度估计方法, 深度学习算法. E-mail: zhangjesse@njust.edu.cn

    杨健:南京理工大学计算机科学与工程学院教授, 长江学者, IAPR Fellow. 主要研究方向为矩阵回归, 自动驾驶和机器人场景的视觉感知. 本文通信作者. E-mail: csjyang@njust.edu.cn

Online Adaptation Through Meta-learning for Stereo Depth Estimation

Funds: Supported by National Natural Science Foundation of China (U1713208)
More Information
    Author Bio:

    ZHANG Zhen-Yu Ph.D. candidate at PCA Laboratory, School ofComputer Science and Engineering, Nanjing University of Science and Technology. He received his bachelor degree in 2015. His research interest covers computer vision and deep learning, specially on depth estimation

    YANG Jian Professor at the School of Computer Science and Engineering, Nanjing University of Science and Technology. He is also a Changjiang Scholar and IAPR Fellow. His research interest covers matrix regression, visual perception in autonomous driving and robotics. Corresponding author of this paper

  • 摘要: 双目深度估计的在线适应是一个有挑战性的问题, 其要求模型能够在不断变化的目标场景中在线连续地自我调整并适应于当前环境. 为处理该问题, 提出一种新的在线元学习适应算法(Online meta-learning model with adaptation, OMLA), 其贡献主要体现在两方面: 首先引入在线特征对齐方法处理目标域和源域特征的分布偏差, 以减少数据域转移的影响; 然后利用在线元学习方法调整特征对齐过程和网络权重, 使模型实现快速收敛. 此外, 提出一种新的基于元学习的预训练方法, 以获得适用于在线学习场景的深度网络参数. 相关实验分析表明, OMLA和元学习预训练算法均能帮助模型快速适应于新场景, 在KITTI数据集上的实验对比表明, 本文方法的效果超越了当前最佳的在线适应算法, 接近甚至优于在目标域离线训练的理想模型.
  • 图  1  本文提出的基于元学习的深度估计在线适应算法框架

    Fig.  1  The proposed meta-learning framework for online stereo adaptation

    图  2  本文提出的在线元学习适应方法

    Fig.  2  The proposed online meta-learning with adaptation (OMLA) method

    图  3  在KITTI Eigen测试集中3个不同视频序列上的效果 (为了展示模型的在线适应效果随时间的变化, 此处展示了视频初始, 中段和末段时刻的深度估计效果)

    Fig.  3  Performance on three different videos of KITTI Eigen test (We illustrated predictions of initial, medium, and last frames)

    表  1  KITTI Eigen测试集的算法消融实验 (仅评估 50 m之内的深度估计效果)

    Table  1  Ablation study on KITTI Eigen test set (the results are evaluated within 50 m)

    方法预训练方式平均得分 最后 20% 帧的平均得分
    RMSEAbs RelSq Rel${\rm{RMSE} }_{ {\rm{log} } }$ RMSEAbs RelSq Rel${\rm{RMSE} }_{ {\rm{log} } }$
    基准方法12.20120.43575.56721.3598 12.28740.44525.52131.3426
    基准方法标准预训练方法9.05180.24993.29010.95039.03090.25123.31040.9495
    仅在线特征分布对齐3.61350.12500.69720.20413.58570.10310.68870.1910
    OMLA 算法3.50270.09230.66110.18963.39860.08820.65790.1735
    基准方法元预训练方法8.82300.23053.05780.93248.70610.22732.98040.9065
    仅在线特征分布对齐3.50430.09500.66270.19923.48310.08960.65450.1921
    OMLA 算法${\bf{ {3.4051}}}$${\bf{ 0.0864}}$${\bf{{0.6256}}}$${\bf{ 0.1852}}$${\bf{ 3.3803 }}$${\bf{ 0.0798}}$${\bf{{0.6176}}}$${\bf{{0.1801}}}$
    下载: 导出CSV

    表  2  不同网络模型和数据库上的结果对比

    Table  2  Comparison on different network architectures and datasets

    网络模型方法预训练于 Synthia[20] 预训练于 Scene Flow Driving[41]帧速率 (帧/s)
    RMSEAbs RelSq Rel${\rm{RMSE} }_{ {\rm{log} } }$ RMSEAbs RelSq Rel${\rm{RMSE} }_{ {\rm{log} } }$
    ResNet[9]基准方法9.05180.24993.29010.8577 9.08930.26023.38960.8901${\bf{ 5.06}}$
    OMLA + 元预训练${\bf{ 3.4051}}$${\bf{ 0.0864}}$${{\bf{{0.6256}}}}$${\bf{ 0.1852}}$${\bf{ 4.0573}}$${\bf{ 0.1231}}$${\bf{ 1.1532}}$${\bf{ 0.1985}}$3.40
    MADNet[15]基准方法8.86500.26843.15030.82338.98230.27903.30210.8350${\bf{ 12.05}}$
    OMLA + 元预训练${\bf{ 4.0236}}$${\bf{ 0.1756}}$${\bf{ 1.1825}}$${\bf{ 0.2501}}$${\bf{ 4.2179}}$${\bf{ 0.1883}}$${\bf{ 1.2761}}$${\bf{ 0.2523}}$9.56
    DispNet[41]基准方法9.02220.27104.32810.94529.15870.28054.35900.9528${\bf{ 5.42}}$
    OMLA + 元预训练${\bf{ 4.5201}}$${\bf{ 0.2396}}$${\bf{ 1.3104}}$${\bf{ 0.2503}}$${\bf{ 4.6314}}$${\bf{ 0.2457}}$${\bf{ 1.3541}}$${\bf{ 0.2516}}$4.00
    下载: 导出CSV

    表  3  与理想模型和当前最优方法的比较 (仅比较实际深度值小于50 m的像素点)

    Table  3  Comparison with ideal models and state-of-the-art method (Results are only evaluated within 50 m)

    网络模型在线适应算法预训练域RMSEAbs RelSq Rel${\rm{RMSE}}_{{\rm{log}}}$$\alpha>1.25$$\alpha>1.25^2$$\alpha>1.25^3$
    ResNet[9]目标域3.69750.09831.17200.19230.91660.95800.9778
    基准方法目标域3.4359${\bf{ 0.0850}}$0.65470.1856${{\bf{{0.9203}}}}$0.96120.9886
    L2A[35]源域3.50300.09130.6522${\bf{ 0.1840}}$0.91700.96110.9882
    OMLA+元预训练源域${\bf{ 3.4051}}$0.0864${\bf{ 0.6256 }}$0.18520.9170${\bf{ 0.9623}}$${\bf{ 0.9901}}$
    MADNet[15]目标域${{\bf{{3.8965}}}}$0.17931.2369${{\bf{{0.2457}}}}$0.91470.96010.9790
    基准方法目标域3.90230.17601.19020.2469${\bf{ 0.9233}}$0.96520.9813
    L2A[35]源域4.15060.17881.19350.25330.91310.94430.9786
    OMLA+元预训练源域4.0236${{\bf{{0.1756}}}}$${{\bf{{1.1825}}}}$0.25010.9022${\bf{ 0.9658}}$${\bf{ 0.9842}}$
    DispNet[41]目标域4.52100.2433${{\bf{{1.2801}}}}$${{\bf{{0.2490}}}}$0.91260.9472${{\bf{{0.9730}}}}$
    基准方法目标域4.5327${\bf{ 0.2368}}$1.28530.2506${\bf{ 0.9178}}$${\bf{ 0.9600}}$0.9725
    L2A[35]源域4.62170.24101.29020.25930.90620.95130.9688
    OMLA+元预训练源域${{\bf{{4.5201}}}}$0.23961.31040.25030.90850.94600.9613
    下载: 导出CSV
  • [1] Eigen D, Fergus R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 2650−2658
    [2] Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N. Deeper depth prediction with fully convolutional residual networks. In: Proceedings of the 4th International Conference on 3D Vision (3DV). Stanford, USA: IEEE, 2016. 239−248
    [3] Fu H, Gong M M, Wang C H, Batmanghelich K, Tao D C. Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 2002−2011
    [4] Xu D, Wang W, Tang H, Liu H, Sebe N, Ricci E. Structured attention guided convolutional neural fields for monocular depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 3917−3925
    [5] Zhang Z Y, Cui Z, Xu C Y, Jie Z Q, Li X, Yang J. Joint task-recursive learning for semantic segmentation and depth estimation. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 235−251
    [6] Wofk D, Ma F C, Yang T J, Karaman S, Sze V. FastDepth: Fast monocular depth estimation on embedded systems. In: Proceedings of the International Conference on Robotics and Automation (ICRA). Montreal, Canada: IEEE, 2019. 6101−6108
    [7] Ma F C, Karaman S. Sparse-to-dense: Depth prediction from sparse depth samples and a single image. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Brisbane, Australia: IEEE, 2018. 1−8
    [8] Garg R, Kumar B G V, Carneiro G, Reid I. Unsupervised CNN for single view depth estimation: Geometry to the rescue. In: Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer, 2016. 740−756
    [9] Godard C, Aodha O M, Brostow G J. Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 270−279
    [10] Pilzer A, Xu D, Puscas M, Ricci E, Sebe N. Unsupervised adversarial depth estimation using cycled generative networks. In: Proceedings of the International Conference on 3D Vision (3DV). Verona, Italy: IEEE, 2018. 587−595
    [11] Pillai S, Ambruş R, Gaidon A. SuperDepth: Self-supervised, super-resolved monocular depth estimation. In: Proceedings of the International Conference on Robotics and Automation (ICRA). Montreal, Canada: IEEE, 2019. 9250−9256
    [12] Mancini M, Karaoguz H, Ricci E, Jensfelt P, Caputo B. Kitting in the wild through online domain adaptation. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Madrid, Spain: IEEE, 2018. 1103−1109
    [13] Carlucci F M, Porzi L, Caputo B, Ricci E, Bulò S R. AutoDIAL: Automatic domain alignment layers. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE, 2017. 5077−5085
    [14] Menze M, Geiger A. Object scene flow for autonomous vehicles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE, 2015. 3061−3070
    [15] Tonioni A, Tosi F, Poggi M, Mattoccia S, Di Stefano L. Real-time self-adaptive deep stereo. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 195−204
    [16] Liu F Y, Shen C H, Lin G S, Reid I. Learning depth from single monocular images using deep convolutional neural fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(10): 2024-2039 doi: 10.1109/TPAMI.2015.2505283
    [17] Yang N, Wang R, Stückler J, Cremers D. Deep virtual stereo odometry: Leveraging deep depth prediction for monocular direct sparse odometry. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 835−852
    [18] Silberman N, Hoiem D, Kohli P, Fergus R. Indoor segmentation and support inference from RGBD images. In: Proceedings of the 12th European Conference on Computer Vision. Florence, Italy: Springer, 2012. 746−760
    [19] Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE, 2012. 3354−3361
    [20] Ros G, Sellart L, Materzynska J, Vazquez D, Lopez A M. The SYNTHIA dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 3234−3243
    [21] Zhou T H, Brown M, Snavely N, Lowe D G. Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 1851−1858
    [22] Kundu J N, Uppala P K, Pahuja A, Babu R V. AdaDepth: Unsupervised content congruent adaptation for depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 2656−2665
    [23] Csurka G. Domain adaptation for visual applications: A comprehensive survey. arXiv preprint arXiv: 1702.05374, 2017.
    [24] Long M S, Zhu H, Wang J M, Jordan M I. Deep transfer learning with joint adaptation networks. In: Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: JMLR.org, 2017. 2208−2217
    [25] Venkateswara H, Eusebio J, Chakraborty S, Panchanathan S. Deep hashing network for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 5018−5027
    [26] Bousmalis K, Irpan A, Wohlhart P, Bai Y F, Kelcey M, Kalakrishnan M, et al. Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Brisbane, Australia: IEEE, 2018. 4243−4250
    [27] Sankaranarayanan S, Balaji Y, Castillo C D, Chellappa R. Generate to adapt: Aligning domains using generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 8503−8512
    [28] Wulfmeier M, Bewley A, Posner I. Incremental adversarial domain adaptation for continually changing environments. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Brisbane, Australia: IEEE, 2018. 1−9
    [29] Li Y H, Wang N Y, Shi J P, Liu J Y, Hou X D. Revisiting batch normalization for practical domain adaptation. In: Proceedings of the 5th International Conference on Learning Representations. Toulon, France: OpenReview.net, 2017.
    [30] Mancini M, Porzi L, Bulò S R, Caputo B, Ricci E. Boosting domain adaptation by discovering latent domains. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 3771−3780
    [31] Tobin J, Fong R, Ray A, Schneider J, Zaremba W, Abbeel P. Domain randomization for transferring deep neural networks from simulation to the real world. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Vancouver, Canada: IEEE, 2017. 23−30
    [32] Tonioni A, Poggi M, Mattoccia S, Di Stefano L. Unsupervised adaptation for deep stereo. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE, 2017. 1605−1613
    [33] Pang J H, Sun W X, Yang C X, Ren J, Xiao R C, Zeng J, et al. Zoom and learn: Generalizing deep stereo matching to novel domains. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 2070−2079
    [34] Zhao S S, Fu H, Gong M M, Tao D C. Geometry-aware symmetric domain adaptation for monocular depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 9788−9798
    [35] Tonioni A, Rahnama O, Joy T, Di Stefano L, Ajanthan T, Torr P H S. Learning to adapt for stereo. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 9661−9670
    [36] Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D. Matching networks for one shot learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: Curran Associates Inc., 2016. 3637−3645
    [37] Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: JMLR.org, 2017. 1126−1135
    [38] Guo Y H, Shi H H, Kumar A, Grauman K, Rosing T, Feris R. SpotTune: Transfer learning through adaptive fine-tuning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 4805−4814
    [39] Park E, Berg A C. Meta-tracker: Fast and robust online adaptation for visual object trackers. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 587−604
    [40] Wang P, Shen X H, Lin Z, Cohen S, Price B, Yuille A. Towards unified depth and semantic prediction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE, 2015. 2800−2809
    [41] Mayer N, Ilg E, Häusser P, Fischer P, Cremers D, Dosovitskiy A, et al. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 4040−4048
    [42] Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, et al. Automatic differentiation in PyTorch. In: Proceedings of the 31st Conference on Neural Information Processing Systems. Long Beach, USA: 2017.
    [43] Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning. Lille, France: JMLR.org, 2015. 448−456
    [44] Kingma D P, Ba J. Adam: A method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: 2015.
  • 加载中
图(3) / 表(3)
计量
  • 文章访问数:  1451
  • HTML全文浏览量:  456
  • PDF下载量:  225
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-05-07
  • 录用日期:  2020-09-14
  • 网络出版日期:  2021-07-05
  • 刊出日期:  2023-07-20

目录

    /

    返回文章
    返回