2.793

2018影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于DDPG的三维重建模糊概率点推理

李雷 徐浩 吴素萍

李雷, 徐浩, 吴素萍. 基于DDPG的三维重建模糊概率点推理. 自动化学报, 2021, x(x): 1−14 doi: 10.16383/j.aas.c200543
引用本文: 李雷, 徐浩, 吴素萍. 基于DDPG的三维重建模糊概率点推理. 自动化学报, 2021, x(x): 1−14 doi: 10.16383/j.aas.c200543
Li Lei, Xu Hao, Wu Su-Ping. Fuzzy probability points reasoning for 3D reconstructionvia deep deterministic policy gradient. Acta Automatica Sinica, 2021, x(x): 1−14 doi: 10.16383/j.aas.c200543
Citation: Li Lei, Xu Hao, Wu Su-Ping. Fuzzy probability points reasoning for 3D reconstructionvia deep deterministic policy gradient. Acta Automatica Sinica, 2021, x(x): 1−14 doi: 10.16383/j.aas.c200543

基于DDPG的三维重建模糊概率点推理

doi: 10.16383/j.aas.c200543
基金项目: 国家自然科学基金(62062056, 61662059)资助
详细信息
    作者简介:

    李雷:宁夏大学信息工程学院硕士研究生. 主要研究方向为三维物体重建, 人脸重建以及关键点对齐, 图像处理, 计算机视觉与模式识别. E-mail: lliicnxu@163.com

    徐浩:宁夏大学信息工程学院硕士研究生. 主要研究方向为计算机视觉和三维人体姿态估计. E-mail: hao_xu321@163.com

    吴素萍:宁夏大学信息工程学院教授. 主要研究方向为三维重建, 计算机视觉, 模式识别, 并行分布处理与大数据. 本文通信作者. E-mail: pswuu@nxu.edu.cn

Fuzzy Probability Points Reasoning for 3D Reconstructionvia Deep Deterministic Policy Gradient

Funds: Supported by National Natural Science Foundation of China (62062056, 61662059)
More Information
    Author Bio:

    LI Lei Master student of the School of Information Engine-ering, Ningxia University. His research interest covers 3D object reconstruction, face reconstruction and landmark alignment, image processing, computer vision and pattern recognition

    XU Hao Master student of the School of Information En- gineering,Ningxia University. His research interest covers computer vision and 3D human pose estimation

    WU Su-Ping Professor of the School of Information Engine- erring, Ningxia University. Her research interest covers 3D reconstruction, computer vision, pattern recognition, parallel distributed processing and big data. Corresponding author of this paper

  • 摘要: 单视图物体三维重建是一个长期存在的具有挑战性的问题. 为了解决具有复杂拓扑结构的物体以及一些高保真度的表面细节信息仍然难以准确进行恢复的问题, 本文提出了一种基于深度强化学习的算法深度确定性策略梯度(Deep deterministic policy gradient, DDPG)对三维重建中模糊概率点进行再推理, 实现了具有高保真和丰富细节的单视图三维重建. 本文的方法是端到端的, 包括以下四个部分: 拟合物体三维形状的动态分支代偿网络的学习过程, 聚合模糊概率点周围点的邻域路由机制, 注意力机制引导的信息聚合和基于深度强化学习算法的模糊概率调整. 本文在公开的大规模三维形状数据集上进行了大量的实验证明了本文方法的正确性和有效性. 本文提出的方法结合了强化学习和深度学习, 聚合了模糊概率点周围的局部信息和图像全局信息, 从而有效的提升了模型对复杂拓扑结构和高保真度的细节信息的重建能力.
  • 图  1  基于深度学习的单视图三维重建中三种表示形状

    Fig.  1  Three representation shapes for single-view 3D reconstruction based on deep learning

    图  2  (a)为输入图像 (b)DISN结果 (c)本文方法的结果

    Fig.  2  Single image reconstruction using a DISN, and our method on real images

    图  3  MNGD框架的整体流程图

    Fig.  3  The workflow of the proposed MNGD framework

    图  4  动态分支代偿网络框架图

    Fig.  4  The framework of the dynamic branch compensation network

    图  5  邻域路由过程

    Fig.  5  The whole process of neighbor routing

    图  6  聚合特征时的注意力机制

    Fig.  6  Attention mechanism when features are aggregated

    图  7  卷积可视化与网格生成过程

    Fig.  7  Convolution visualization and mesh generation process

    图  8  ShapeNet数据集上的定性结果

    Fig.  8  Qualitative results on the ShapeNet dataset

    图  9  Online Products dataset的定性结果

    Fig.  9  Qualitative results on Online Products dataset

    图  10  消融实验的定性结果

    Fig.  10  Qualitative results of Ablation Study

    图  11  MNGD随机调整100张图片中模糊概率点的结果

    Fig.  11  The result of MNGD adjusting the fuzzy probability points in 100 random images

    图  12  ShapeNet上所有类别的定性结果

    Fig.  12  Qualitative results on ShapeNet of all categories

    图  13  单视图三维重建中具有挑战性案例

    Fig.  13  Challenging cases in single-view 3D reconstruction

    表  1  本文的方法在ShapeNet数据集上与最先进方法的交并比(IoU)的定量比较

    Table  1  The quantitative comparison of our method with the state-of-the-art methods for IoU on ShapeNet dataset

    类别\方法3D-R2N2Pix2MeshAtlasNetONetOur
    Airplane0.4260.4200.5710.592
    Bench0.3730.3230.4850.503
    cabinet0.6670.6640.7330.757
    Car0.6610.5520.7370.755
    Chair0.4390.3960.5010.542
    Display0.4400.4900.4710.548
    Lamp0.2810.3230.3710.409
    Loudspeaker0.6110.5990.6470.672
    Rifle0.3750.4020.4740.500
    Sofa0.6260.6130.6800.701
    Table0.4200.3950.5060.547
    Telephone0.6110.6610.7200.763
    Vessel0.4820.3970.5300.569
    Mean0.4930.4800.5710.605
    下载: 导出CSV

    表  2  本文的方法在ShapeNet数据集上与最先进方法法线一致性(NC)的定量比较

    Table  2  The quantitative comparison of our method with the state-of-the-art methods for NC on ShapeNet dataset

    类别\方法3D-R2N2Pix2MeshAtlasNetONetOur
    Airplane0.6290.7590.8360.8400.847
    Bench0.6780.7320.7790.8130.818
    Cabinet0.7820.8340.8500.8790.887
    Car0.7140.7560.8360.8520.855
    Chair0.6630.7460.7910.8230.835
    Display0.7200.8300.8580.8540.871
    Lamp0.5600.6660.6940.7310.751
    Loudspeaker0.7110.7820.8250.8320.845
    Rifle0.6700.7180.7250.7660.781
    Sofa0.7310.8200.8400.8630.872
    Table0.7320.7840.8320.8580.864
    Telephone0.8170.9070.9230.9350.938
    Vessel0.6290.6990.7560.7940.801
    Mean0.6950.7720.8110.8340.844
    下载: 导出CSV

    表  3  本文的方法在ShapeNet数据集上与最先进方法倒角距离 (CD)的定量比较

    Table  3  The quantitative comparison of our method with the state-of-the-art methods for CD on ShapeNet dataset

    类别\方法3D-R2N2Pix2MeshAtlasNetONetOur
    Airplane0.2270.1870.1040.1470.130
    Bench0.1940.2010.1380.1550.149
    Cabinet0.2170.1960.1750.1670.146
    Car0.2130.1800.1410.1590.144
    Chair0.2700.2650.2090.2280.200
    Display0.3140.2390.1980.2780.220
    Lamp0.7780.3080.3050.4790.364
    Loudspeaker0.3180.2850.2450.3000.263
    Rifle0.1830.1640.1150.1410.130
    Sofa0.2290.2120.1770.1940.179
    Table0.2390.2180.1900.1890.170
    Telephone0.1950.1490.1280.1400.121
    Vessel0.2380.2120.1510.2180.189
    Mean0.2780.2160.1750.2150.185
    下载: 导出CSV

    表  4  消融实验

    Table  4  Ablation Study

    模型\指标IoUNCCD
    FM w/o DR, MB0.5930.8400.194
    FM w/o MB0.5990.8390.194
    FM0.6050.8440.185
    下载: 导出CSV
  • [1] 陈加, 张玉麒, 宋鹏, 魏艳涛, 王煜. 深度学习在基于单幅图像的物体三维重建中的应用. 自动化学报, 2019, 45(4): 657−668

    Chen Jia, Zhang Yu-Qi, Song Peng, Wei Yan-Tao, Wang Yu. Application of deep learning to 3D object reconstruction from a single image. Acta Automatica Sinica, 2019, 45(4): 657−668
    [2] 郑太雄, 黄帅, 李永福, 冯明驰. 基于视觉的三维重建关键技术研究综述. 自动化学报, 2020, 46(4): 631−652

    Zheng Tai-Xiong, Huang Shuai, Li Yong-Fu, Feng Ming-Chi. Key techniques for vision based 3D reconstr- uction: a review. Acta Automatica Sinica, 2020, 46(4): 631−652
    [3] 薛俊诗, 易辉, 吴止锾, 陈向宁. 一种基于场景图分割的混合式多视图三维重建方法. 自动化学报, 2020, 46(4): 782−795

    Xue Jun-Shi, Yi Hui, Wu Zhi-Huan, Chen Xiang-Ning. A hybrid multi-view 3D reconstruction method based on scene graph partition. Acta Automatica Sinica, 2020, 46(4): 782−795
    [4] Wu J J, Zhang C K, Xue T F, Freeman W T, Tenenbaum J B. Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: Proc- eedings of the 30th Conference on Neural Information Processing Systems. New York, USA: Curran Associates, Inc., 2016. 82−90
    [5] Choy C B, Xu D F, Gwak J Y, Chen K, Savarese S. 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer, 2016. 628−644
    [6] Yao Y, Luo Z X, Li S W, Fang T, Quan L. Mvsnet: depth inference for unstructured multi-view stereo. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 767−783
    [7] Wu J J, Wang Y F, Xue T F, Sun X Y, Freeman W T, Tenenbaum J B. MarrNet: 3D shape reconstruction via 2.5D sketches. In: Proceedings of the 2017 Advances in Neural Information Processing Systems. Long Beach, CA, USA: Curran Associates, Inc., 2017. 8−15
    [8] Fan H Q, Su H, Guibas L. A point set generation network for 3D object reconstruction from a single image. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, Hawaii, USA: IEEE, 2017. 2463−2471
    [9] Wang N Y, Zhang Y D, Li ZW, Fu Y W, Liu W, Jiang Y G. Pixel2mesh: Generating 3d mesh models from single rgb images. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 52−67
    [10] Scarselli F, Gori M, Tsoi A C, Hagenbuchner M, Monfardini G. The graph neural network model. IEEE Transactions on Neural Networks, 2008, 20(1): 61−80
    [11] Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors. Nature, 1986, 323(6088): 533−536 doi: 10.1038/323533a0
    [12] Richter S R, Roth S. Matryoshka networks: Predicting 3d geometry via nested shape layers. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah, USA: IEEE, 2018. 1936−1944
    [13] Wu J J, Zhang C K, Zhang X M, Zhang Z T, Freeman W T, Tenenbaum J B. Learning shape priors for single-view 3D completion and reconstruction. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 646−662
    [14] Groueix T, Fisher M, Kim V G, Russell B C, Aubry M. A papier-mâché approach to learning 3d surface generation. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah, USA: IEEE, 2018. 216−224
    [15] Kanazawa A, Black M J, Jacobs D W, Malik J. End-to-end recovery of human shape and pose. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah, USA: IEEE, 2018. 7122−7131
    [16] Kong C, Lin C H, Lucey S. Using locally corresponding CAD models for dense 3D reconstructions from a single image. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, Hawaii, USA: IEEE, 2017. 4857−4865
    [17] Mescheder L, Oechsle M, Niemeyer M, Nowozin S, Geiger A. Occupancy networks: Learning 3d reconstruction in function space. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE, 2019. 4460−4470
    [18] Lillicrap T P, Hunt J J, Pritzel A, Heess N, Erez T, Tassa Y, et al. Continuous control with deep reinforcement learning. arXiv preprint arXiv: 1509.02971, 2015
    [19] Li D, Chen Q F. Dynamic hierarchical mimicking towards consistent optimization objectives. In: Procee- dings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 2020. 7642−7651
    [20] Chang A X, Funkhouser T, Guibas L, Hanrahan P, Huang Q X, Li Z M, et al. Shapenet: an information-rich 3d model repository. arXiv preprint arXiv: 1512.03012, 2015
    [21] Durou J D, Falcone M, Sagona M. Numerical methods for shape-from-shading: A new survey with benchmarks. Computer Vision and Image Understanding, 2008, 109(1): 22−43 doi: 10.1016/j.cviu.2007.09.003
    [22] Zhang R, Tsai P S, Cryer J E, et al. Shape-from-shading: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999, 21(8): 690−706 doi: 10.1109/34.784284
    [23] Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde- Farley D, Ozair S, et al. Generative adversarial nets. In: Proceedings of the 2014 Advances in Neural Information Processing Systems. Montreal, Canada: Curran Associat- es, Inc., 2014. 2672−2680
    [24] Kingma D P, Welling M. Auto-encoding variational bayes. arXiv preprint arXiv: 1312.6114, 2013
    [25] Kar A, Häne C, Malik J. Learning a multi-view stereo machine. In: Proceedings of the 2017 Advances in Neural Information Processing Systems. Long Beach, CA, USA: Curran Associates, Inc., 2017. 365−376
    [26] Tatarchenko M, Dosovitskiy A, Brox T. Octree generating networks: efficient convolutional architectur- es for highresolution 3D outputs. In: Proceedings of the 2017 IEEE International Conference on Computer Visio- n. Venice, Italy: IEEE, 2017. 2088−2096
    [27] Wang W Y, Ceylan D, Mech R, Neumann U. 3dn: 3d deformation network. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE, 2019. 1038−1046
    [28] Bernardini F, Mittleman J, Rushmeier H, Silva C, Taubin G. The ball-pivoting algorithm for surface reconstruction. IEEE Transactions on Visualization and Computer Graphics, 1999, 5(4): 349−359 doi: 10.1109/2945.817351
    [29] Kazhdan M, Hoppe H. Screened poisson surface reconst- ruction. ACM Transactions on Graphics (ToG), 2013, 32(3): 1−13
    [30] Calakli F, Taubin G. SSD: Smooth signed distance surface reconstruction. Computer Graphics Forum. Oxford, UK: Blackwell Publishing Ltd, 2011, 30(7): 1993−2002
    [31] Chen Z, Zhang H. Learning implicit fields for generative shape modeling. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE, 2019. 5939−5948
    [32] Xu Q G, Wang W Y, Ceylan D, Mech R, Neumann U. Disn: Deep implicit surface network for high-quality single-view 3d reconstruction. In: Proceedings of the 2019 Advances in Neural Information Processing Systems. Vancouver, Canada: Curran Associates, Inc., 2019. 492−502
    [33] Wang Q L, Wu B G, Zhu P F, Li P H, Zuo W M, Hu Q H. ECA-net: Efficient channel attention for deep convol- utional neural networks. In: Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Reco- gnition. Seattle, WA, USA: IEEE, 2020. 11531−11539
    [34] Selvaraju R R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, Hawaii, USA: IEEE, 2017. 618−626
    [35] Garland M, Heckbert P S. Simplifying surfaces with color and texture using quadric error metrics. In: Proceedings of the Visualization '98 (Cat. No.98CB362-), Research Triangle Park, NC, USA, 1998. 263−269
    [36] Lorensen W E, Cline H E. Marching cubes: A high resolution 3D surface construction algorithm. ACM Siggraph Computer Graphics, 1987, 21(4): 163−169 doi: 10.1145/37402.37422
    [37] Drucker H, Le Cun Y. Improving generalization perfor- mance using double backpropagation. IEEE Transaction- s on Neural Networks, 1992, 3(6): 991−997 doi: 10.1109/72.165600
    [38] Oh Song H, Xiang Y, Jegelka S, Savarese S. Deep metric learning via lifted structured feature embedding. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016. 4004−4012
    [39] Stutz D, Geiger A. Learning 3d shape completion from laser scan data with weak supervision. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah, USA: IEEE, 2018. 1955−1964
    [40] De Vries H, Strub F, Mary J, Larochelle H, Pietquin O, Courville A C. Modulating early visual processing by language. In: Proceedings of the 2017 Advances in Neural Information Processing Systems. Long Beach, CA, USA: Curran Associates, Inc., 2017. 6594−6604
    [41] He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016. 770−778
    [42] Kingma D P, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv: 1412.6980, 2014
    [43] Zhu C C, Liu H, Yu Z H, Sun, X H. Towards omni-supervised face alignment for large scale unlabeled videos. In: Proceedings of the AAAI Conference on Artificial Intelligence. New York, New York, USA: AAAI, 2020. 13090−13097
    [44] Zhu C C, Li X Q, Li J D, Ding G T, Tong W Q. Spatialtemporal knowledge integration: robust self-supervised facial landmark tracking. In: Proceedings of the 28th ACM International Conference on Multimedia. Seattle, WA, USA: ACM, 2020. 4135−4143
  • 加载中
计量
  • 文章访问数:  244
  • HTML全文浏览量:  38
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-07-13
  • 修回日期:  2020-12-05
  • 网络出版日期:  2021-03-02

目录

    /

    返回文章
    返回