2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于DDPG的三维重建模糊概率点推理

李雷 徐浩 吴素萍

李雷, 徐浩, 吴素萍. 基于DDPG的三维重建模糊概率点推理. 自动化学报, 2022, 48(4): 1105−1118 doi: 10.16383/j.aas.c200543
引用本文: 李雷, 徐浩, 吴素萍. 基于DDPG的三维重建模糊概率点推理. 自动化学报, 2022, 48(4): 1105−1118 doi: 10.16383/j.aas.c200543
Li Lei, Xu Hao, Wu Su-Ping. Fuzzy probability points reasoning for 3D reconstruction via deep deterministic policy gradient. Acta Automatica Sinica, 2022, 48(4): 1105−1118 doi: 10.16383/j.aas.c200543
Citation: Li Lei, Xu Hao, Wu Su-Ping. Fuzzy probability points reasoning for 3D reconstruction via deep deterministic policy gradient. Acta Automatica Sinica, 2022, 48(4): 1105−1118 doi: 10.16383/j.aas.c200543

基于DDPG的三维重建模糊概率点推理

doi: 10.16383/j.aas.c200543
基金项目: 国家自然科学基金(62062056, 61662059)资助
详细信息
    作者简介:

    李雷:宁夏大学信息工程学院硕士研究生. 主要研究方向为三维物体重建, 人脸重建以及关键点对齐, 图像处理和计算机视觉与模式识别. E-mail: lliicnxu@163.com

    徐浩:宁夏大学信息工程学院硕士研究生. 主要研究方向为计算机视觉和三维人体姿态估计. E-mail: hao_xu321@163.com

    吴素萍:宁夏大学信息工程学院教授. 主要研究方向为三维重建, 计算机视觉, 模式识别, 并行分布处理与大数据. 本文通信作者. E-mail: pswuu@nxu.edu.cn

Fuzzy Probability Points Reasoning for 3D Reconstruction Via Deep Deterministic Policy Gradient

Funds: Supported by National Natural Science Foundation of China (62062056, 61662059)
More Information
    Author Bio:

    LI Lei Master student at the School of Information Engineering, Ningxia University. His research interest covers 3D object reconstruction, face reconstruction and landmark alignment, image processing, computer vision and pattern recognition

    XU Hao Master student at the School of Information Engineering, Ningxia University. His research interest covers computer vision and 3D human pose estimation

    WU Su-Ping Professor at the School of Information Engineerring, Ningxia University. Her research interest covers 3D reconstruction, computer vision, pattern recognition, parallel distributed processing and big data. Corresponding author of this paper

  • 摘要: 单视图物体三维重建是一个长期存在的具有挑战性的问题. 为了解决具有复杂拓扑结构的物体以及一些高保真度的表面细节信息仍然难以准确进行恢复的问题, 本文提出了一种基于深度强化学习算法深度确定性策略梯度 (Deep deterministic policy gradient, DDPG)的方法对三维重建中模糊概率点进行再推理, 实现了具有高保真和丰富细节的单视图三维重建. 本文的方法是端到端的, 包括以下四个部分: 拟合物体三维形状的动态分支代偿网络的学习过程, 聚合模糊概率点周围点的邻域路由机制, 注意力机制引导的信息聚合和基于深度强化学习算法的模糊概率调整. 本文在公开的大规模三维形状数据集上进行了大量的实验证明了本文方法的正确性和有效性. 本文提出的方法结合了强化学习和深度学习, 聚合了模糊概率点周围的局部信息和图像全局信息, 从而有效地提升了模型对复杂拓扑结构和高保真度的细节信息的重建能力.
  • 图  1  基于深度学习的单视图三维重建中三种表示形状

    Fig.  1  Three representation shapes for single-view 3D reconstruction based on deep learning

    图  2  本文方法和DISN方法在真实图像上的单视图重建结果

    Fig.  2  Single image reconstruction using a DISN, and our method on real images

    图  3  MNGD框架的整体流程图

    Fig.  3  The workflow of the proposed MNGD framework

    图  4  动态分支代偿网络框架图

    Fig.  4  The framework of the dynamic branch compensation network

    图  5  邻域路由过程

    Fig.  5  The whole process of neighbor routing

    图  6  聚合特征时的注意力机制

    Fig.  6  Attention mechanism when features are aggregated

    图  7  卷积可视化与网格生成过程

    Fig.  7  Convolution visualization and mesh generation process

    图  8  ShapeNet数据集上的定性结果

    Fig.  8  Qualitative results on the ShapeNet dataset

    图  9  Online Products dataset的定性结果

    Fig.  9  Qualitative results on Online Products dataset

    图  10  消融实验的定性结果

    Fig.  10  Qualitative results of ablation study

    图  11  MNGD随机调整100张图片中模糊概率点的结果

    Fig.  11  The result of MNGD adjusting the fuzzy probability points in 100 random images

    图  12  ShapeNet上所有类别的定性结果

    Fig.  12  Qualitative results on ShapeNet of all categories

    图  13  单视图三维重建中具有挑战性案例

    Fig.  13  Challenging cases in single-view 3D reconstruction

    表  1  本文的方法在ShapeNet数据集上与最先进方法的交并比(IoU)的定量比较

    Table  1  The quantitative comparison of our method with the state-of-the-art methods for IoU on ShapeNet dataset

    类别\方法3D-R2N2Pix2MeshAtlasNetONetOur
    Airplane0.4260.4200.5710.592
    Bench0.3730.3230.4850.503
    cabinet0.6670.6640.7330.757
    Car0.6610.5520.7370.755
    Chair0.4390.3960.5010.542
    Display0.4400.4900.4710.548
    Lamp0.2810.3230.3710.409
    Loudspeaker0.6110.5990.6470.672
    Rifle0.3750.4020.4740.500
    Sofa0.6260.6130.6800.701
    Table0.4200.3950.5060.547
    Telephone0.6110.6610.7200.763
    Vessel0.4820.3970.5300.569
    Mean0.4930.4800.5710.605
    下载: 导出CSV

    表  2  本文的方法在ShapeNet数据集上与最先进方法法线一致性(NC)的定量比较

    Table  2  The quantitative comparison of our method with the state-of-the-art methods for NC on ShapeNet dataset

    类别\方法3D-R2N2Pix2MeshAtlasNetONetOur
    Airplane0.6290.7590.8360.8400.847
    Bench0.6780.7320.7790.8130.818
    Cabinet0.7820.8340.8500.8790.887
    Car0.7140.7560.8360.8520.855
    Chair0.6630.7460.7910.8230.835
    Display0.7200.8300.8580.8540.871
    Lamp0.5600.6660.6940.7310.751
    Loudspeaker0.7110.7820.8250.8320.845
    Rifle0.6700.7180.7250.7660.781
    Sofa0.7310.8200.8400.8630.872
    Table0.7320.7840.8320.8580.864
    Telephone0.8170.9070.9230.9350.938
    Vessel0.6290.6990.7560.7940.801
    Mean0.6950.7720.8110.8340.844
    下载: 导出CSV

    表  3  本文的方法在ShapeNet数据集上与最先进方法倒角距离 (CD)的定量比较

    Table  3  The quantitative comparison of our method with the state-of-the-art methods for CD on ShapeNet dataset

    类别\方法3D-R2N2Pix2MeshAtlasNetONetOur
    Airplane0.2270.1870.1040.1470.130
    Bench0.1940.2010.1380.1550.149
    Cabinet0.2170.1960.1750.1670.146
    Car0.2130.1800.1410.1590.144
    Chair0.2700.2650.2090.2280.200
    Display0.3140.2390.1980.2780.220
    Lamp0.7780.3080.3050.4790.364
    Loudspeaker0.3180.2850.2450.3000.263
    Rifle0.1830.1640.1150.1410.130
    Sofa0.2290.2120.1770.1940.179
    Table0.2390.2180.1900.1890.170
    Telephone0.1950.1490.1280.1400.121
    Vessel0.2380.2120.1510.2180.189
    Mean0.2780.2160.1750.2150.185
    下载: 导出CSV

    表  4  消融实验

    Table  4  Ablation study

    模型\指标IoUNCCD
    FM w/o DR, MB0.5930.8400.194
    FM w/o MB0.5990.8390.194
    FM0.6050.8440.185
    下载: 导出CSV
  • [1] 陈加, 张玉麒, 宋鹏, 魏艳涛, 王煜. 深度学习在基于单幅图像的物体三维重建中的应用. 自动化学报, 2019, 45(4): 657-668

    Chen Jia, Zhang Yu-Qi, Song Peng, Wei Yan-Tao, Wang Yu. Application of deep learning to 3D object reconstruction from a single image. Acta Automatica Sinica, 2019, 45(4): 657-668
    [2] 郑太雄, 黄帅, 李永福, 冯明驰. 基于视觉的三维重建关键技术研究综述. 自动化学报, 2020, 46(4): 631-652

    Zheng Tai-Xiong, Huang Shuai, Li Yong-Fu, Feng Ming-Chi. Key techniques for vision based 3D reconstruction: A review. Acta Automatica Sinica, 2020, 46(4): 631-652
    [3] 薛俊诗, 易辉, 吴止锾, 陈向宁. 一种基于场景图分割的混合式多视图三维重建方法. 自动化学报, 2020, 46(4): 782-795

    Xue Jun-Shi, Yi Hui, Wu Zhi-Huan, Chen Xiang-Ning. A hybrid multi-view 3D reconstruction method based on scene graph partition. Acta Automatica Sinica, 2020, 46(4): 782-795
    [4] Wu J J, Zhang C K, Xue T F, Freeman W T, Tenenbaum J B. Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: Curran Associates, Inc., 2016. 82−90
    [5] Choy C B, Xu D F, Gwak J Y, Chen K, Savarese S. 3D-R2N2: A unified approach for single and multi-view 3D object reconstruction. In: Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016. 628−644
    [6] Yao Y, Luo Z X, Li S W, Fang T, Quan L. MVSNet: Depth inference for unstructured multi-view stereo. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 785−801
    [7] Wu J J, Wang Y F, Xue T F, Sun X Y, Freeman W T, Tenenbaum J B. MarrNet: 3D shape reconstruction via 2.5D sketches. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates, Inc., 2017. 540−550
    [8] Fan H Q, Su H, Guibas L. A point set generation network for 3D object reconstruction from a single image. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 2463−2471
    [9] Wang N Y, Zhang Y D, Li Z W, Fu Y W, Liu W, Jiang Y G. Pixel2Mesh: Generating 3D mesh models from single RGB images. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 55−71
    [10] Scarselli F, Gori M, Tsoi A C, Hagenbuchner M, Monfardini G. The graph neural network model. IEEE Transactions on Neural Networks, 2009, 20(1): 61-80 doi: 10.1109/TNN.2008.2005605
    [11] Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors. Nature, 1986, 323(6088): 533-536 doi: 10.1038/323533a0
    [12] Roth S, Richter S R. Matryoshka networks: Predicting 3D geometry via nested shape layers. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 1936−1944
    [13] Wu J J, Zhang C K, Zhang X M, Zhang Z T, Freeman W T, Tenenbaum J B. Learning shape priors for single-view 3D completion and reconstruction. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 673−691
    [14] Groueix T, Fisher M, Kim V G, Russell B C, Aubry M. A Papier-Mache approach to learning 3D surface generation. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 216−224
    [15] Kanazawa A, Black M J, Jacobs D W, Malik J. End-to-end recovery of human shape and pose. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 7122−7131
    [16] Kong C, Lin C H, Lucey S. Using locally corresponding CAD models for dense 3D reconstructions from a single image. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 5603−5611
    [17] Mescheder L, Oechsle M, Niemeyer M, Nowozin S, Geiger A. Occupancy networks: Learning 3D reconstruction in function space. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 4455−4465
    [18] Lillicrap T P, Hunt J J, Pritzel A, Heess N, Erez T, Tassa Y, et al. Continuous control with deep reinforcement learning [Online], available: https: //arxiv.org/abs/1509. 02971, July 5, 2019
    [19] Li D, Chen Q F. Dynamic hierarchical mimicking towards consistent optimization objectives. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020. 7639−7648
    [20] Chang A X, Funkhouser T, Guibas L, et al.Shapenet:An information-rich 3d model repository [Online], available: https: //arxiv.org/abs/1512. 03012, December 9, 2015
    [21] Durou J D, Falcone M, Sagona M. Numerical methods for shape-from-shading: A new survey with benchmarks. Computer Vision and Image Understanding, 2008, 109(1): 22-43 doi: 10.1016/j.cviu.2007.09.003
    [22] Zhang R, Tsai P S, Cryer J E, Shah M. Shape-from-shading: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999, 21(8): 690-706 doi: 10.1109/34.784284
    [23] Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press, 2014. 2672−2680
    [24] Kingma D P, Welling M. Auto-encoding variational bayes [Online], available: https: //arxiv. org/abs/1312. 6114, May 1, 2014
    [25] Kar A, Hane C, Malik J. Learning a multi-view stereo machine. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates, Inc., 2017. 364−375
    [26] Tatarchenko M, Dosovitskiy A, Brox T. Octree generating networks: Efficient convolutional architectures for high-resolution 3D outputs. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE, 2017. 2107−2115
    [27] Wang W Y, Ceylan D, Mech R, Neumann U. 3DN: 3D deformation network. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 1038−1046
    [28] Bernardini F, Mittleman J, Rushmeier H, Silva C, Taubin G. The ball-pivoting algorithm for surface reconstruction. IEEE Transactions on Visualization and Computer Graphics, 1999, 5(4): 349-359 doi: 10.1109/2945.817351
    [29] Kazhdan M, Hoppe H. Screened poisson surface reconstruction. ACM Transactions on Graphics, 2013, 32(3): Article No. 29
    [30] Calakli F, Taubin G. SSD: Smooth signed distance surface reconstruction. Computer Graphics Forum, 2011, 30(7): 1993-2002 doi: 10.1111/j.1467-8659.2011.02058.x
    [31] Chen Z Q, Zhang H. Learning implicit fields for generative shape modeling. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 5932−5941
    [32] Wang W Y, Xu Q G, Ceylan D, Mech R, Neumann U. DISN: Deep implicit surface network for high-quality single-view 3D reconstruction. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates, Inc., 2019. Article No. 45
    [33] Wang Q L, Wu B G, Zhu P F, Li P H, Zuo W M, Hu Q H. ECA-Net: Efficient channel attention for deep convolutional neural networks. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020. 11531−11539
    [34] Selvaraju R R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE, 2017. 618−626
    [35] Garland M, Heckbert P S. Simplifying surfaces with color and texture using quadric error metrics. In: Proceedings of the 1998 Visualization′ 98 (Cat. No.98CB362-76). Research Triangle Park, USA: IEEE, 1998. 263−269
    [36] Lorensen W E, Cline H E. Marching cubes: A high resolution 3D surface construction algorithm. ACM SIGGRAPH Computer Graphics, 1987, 21(4): 163-169 doi: 10.1145/37402.37422
    [37] Drucker H, Le Cun Y. Improving generalization performance using double backpropagation. IEEE Transactions on Neural Networks, 1992, 3(6): 991-997 doi: 10.1109/72.165600
    [38] Oh Song H, Xiang Y, Jegelka S, Savarese S. Deep metric learning via lifted structured feature embedding. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 4004−4012
    [39] Stutz D, Geiger A. Learning 3D shape completion from laser scan data with weak supervision. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 1955−1964
    [40] de Vries H, Strub F, Mary J, Larochelle H, Pietquin O, Courville A C. Modulating early visual processing by language. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates, Inc., 2017. 6597−6607
    [41] He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 770−778
    [42] Kingma D P, Ba J. Adam: A method for stochastic optimization [Online], available: https: //arxiv. org/abs/1412. 6980, January 30, 2017
    [43] Zhu C C, Liu H, Yu Z H, Sun, X H. Towards Omni-supervised face alignment for large scale unlabeled videos. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI, 2020. 13090−13097
    [44] Zhu C C, Li X Q, Li J D, Ding G T, Tong W Q. Spatial-temporal knowledge integration: Robust self-supervised facial landmark tracking. In: Proceedings of the 28th ACM International Conference on Multimedia. Lisboa, Portugal: ACM, 2020. 4135−4143
  • 加载中
图(13) / 表(4)
计量
  • 文章访问数:  1678
  • HTML全文浏览量:  555
  • PDF下载量:  212
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-07-13
  • 修回日期:  2020-12-05
  • 网络出版日期:  2021-03-02
  • 刊出日期:  2022-04-13

目录

    /

    返回文章
    返回