2.793

2018影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于混合生成对抗网络的多视角图像生成算法

卫星 李佳 孙晓 刘邵凡 陆阳

卫星, 李佳, 孙晓, 刘邵凡, 陆阳. 基于混合生成对抗网络的多视角图像生成算法. 自动化学报, 2020, 46(x): 1−14 doi: 10.16383/j.aas.c190743
引用本文: 卫星, 李佳, 孙晓, 刘邵凡, 陆阳. 基于混合生成对抗网络的多视角图像生成算法. 自动化学报, 2020, 46(x): 1−14 doi: 10.16383/j.aas.c190743
Wei Xing, Li Jia, Sun Xiao, Liu Shao-Fan, Lu Yang. Cross-view image generation via mixture generative adversarial network. Acta Automatica Sinica, 2020, 46(x): 1−14 doi: 10.16383/j.aas.c190743
Citation: Wei Xing, Li Jia, Sun Xiao, Liu Shao-Fan, Lu Yang. Cross-view image generation via mixture generative adversarial network. Acta Automatica Sinica, 2020, 46(x): 1−14 doi: 10.16383/j.aas.c190743

基于混合生成对抗网络的多视角图像生成算法

doi: 10.16383/j.aas.c190743
基金项目: 安徽省重点研发计划项目(201904d07020008), 国家重点研发计划专项(2018YFC0604404), 合肥工业大学2018年国家级大学生创新创业训练计划项目(201810359019S)资助
详细信息
    作者简介:

    卫星:博士, 2009年于中国科技大学获得博士学位, 现为合肥工业大学副教授, 主要研究方向为深度学习与物联网工程, 无人驾驶解决方案等等. E-mail: weixing@hfut.edu.cn

    李佳:合肥工业大学计算机与信息学院硕士研究生. 主要研究方向为自然语言处理、情感对话生成. E-mail: lijiajia@mail.hfut.edu.cn

    孙晓:博士, 合肥工业大学计算机与信息学院情感计算研究所副教授, 主要研究方向包括情感计算, 自然语言处理, 机器学习与人机交互, 本文通讯作者. E-mail: sunx@hfut.edu.cn

    刘邵凡:出生于1996年, 2018年毕业于合肥工业大学获得学士学位, 现为合肥工业大学研究生, 他的主要研究方向是目标检测和领域自适应. E-mail: frank-uzi@hotmail.com

    陆阳:出生于1967年, 于2002年在合肥工业大学获得博士学位. 他现在是合肥工业大学的一名教授和博士生导师. 他的研究兴趣包括物联网工程和分布式控制系统. E-mail: luyang.hf@126.com

Cross-view Image Generation via Mixture Generative Adversarial Network

Funds: Supported by the Anhui Program on Key Research Project(201904d07020008), the National Program on Key Research Project(2018YFC0604404), Hefei University of Technology 2018 National Undergraduate Training Programs for Innovation and Entrepreneurship(201810359019S)
  • 摘要: 多视角图像生成即基于某个视角图像生成其他多个视角图像, 是多视角展示和虚拟现实目标建模等领域的基本问题, 已引起研究人员的广泛关注. 近年来, 生成对抗网络在多视角图像生成任务上取得了不错的成绩, 但目前的主流方法局限于固定领域, 很难迁移至其他场景, 且生成的图像存在模糊、失真等弊病. 为此本文提出了一种基于混合对抗生成网络的多视角图像生成模型ViewGAN, 它包括多个生成器和一个多类别判别器, 可灵活迁移至多视角生成的多个场景. 在ViewGAN中, 多个生成器被同时训练, 旨在生成不同视角的图像. 此外我们提出了一种基于蒙特卡洛搜索的惩罚机制来促使每个生成器生成高质量的图像, 使得每个生成器更专注于指定视角图像的生成. 在DeepFashion, Dayton, ICG Lab6数据集上的大量实验证明: 我们的模型在Inception Score和Top-k Accuracy上的性能优于目前的主流模型, 并且在SSIM上的分数提升了32.29%, PSNR分数提升了14.32%, SD分数提升了10.18%.
  • 图  1  本文模型ViewGAN在DeepFashion、Dayton和ICG Lab6数据集上的测试样例. 本文模型采用由粗到精的生成方法, 首先生成低分辨率目标图像(LR Image), 之后进一步填补细节得到高分辨率目标图像(HR Image).

    Fig.  1  Examples of ViewGAN on three datasets i.e. DeepFashion, Dayton and ICG Lab6. It employs the coarse-to-fine generation method. The LR images are firstly generated and the HR images are generated by further filling the details.

    图  2  ViewGAN模型的整体框架. 它包含k个生成器 $ \left\{G_{i}\right\}_{i = 1}^{i = k} $ 和一个多类别判别器, 其中每个生成器负责生成某种视角的图像

    Fig.  2  The framework of ViewGAN with k generator $ \left\{G_{i}\right\}_{i = 1}^{i = k} $ and one multi-class discriminator. Each of these generators is responsible for generating images of a certain view.

    图  3  生成器 $ \left(G_{j}\right) $ 的整体框架. 它包含三个部分: 粗粒度生成模块、蒙特卡洛搜索模块和细粒度生成模块. 训练时粗粒度生成模块首先生成低分辨率目标图像(LR Image), 之后使用蒙特卡洛模块进行N次搜索, 然后利用注意力机制综合N次采样结果, 将注意力机制的输出接入到细粒度生成模块中, 最后调用细粒度生成模块生成高分辨率目标图像(HR Image).

    Fig.  3  The framework of the generator $ G_j $ . It consists of three modules: coarse image module, Monte Carlo search module, fine image module. During training, a LR image is firstly generated by the coarse image module conditioned on the target image, conditioned image. The Monte Carlo search module is designed to perform N searches and pass to the attention mechanism. Finally, a HR image is generated by fine image module conditioned on the output of attention mechanism.

    图  4  各模型在DeepFashion数据集上的测试样例

    Fig.  4  Results generated by different models on DeepFashion dataset

    图  5  各模型在Dayton数据集上的测试样例

    Fig.  5  Results generated by different models on Dayton dataset

    图  6  各模型在ICG Lab6数据集上的测试样例

    Fig.  6  Results generated by different models on ICG Lab6 dataset

    图  7  ViewGAN生成图像的可视化过程. (a)表示输入的图像, (b)表示粗粒度模块合成的低分辨率目标图像, (c)表示蒙特卡洛搜索的结果, (d)表示细粒度模块合成的高分辨率目标图像.

    Fig.  7  Visualization of the process of ViewGAN generating images. (a) is the input image. (b) is the LR image generated by coarse image module. (c) are intermediate results generated by Monte Carlo search module. (d) is the HR image generated by fine image module

    表  1  生成器网络结构

    Table  1  Generator network architecture

    网络 层次 输入 输出
    Down-Sample CONV(N64, K4x4, S1, P3)-BN-Leaky Relu (256, 256, 3) (256, 256, 64)
    CONV(N128, K4x4, S2, P1)-BN-Leaky Relu (256, 256, 64) (128, 128, 128)
    CONV(N256, K4x4, S2, P1)-BN-Leaky Relu (128, 128, 128) (64, 64, 256)
    CONV(N512, K4x4, S2, P1)-BN-Leaky Relu (64, 64, 256) (32, 32, 512)
    Residual Block CONV(N512, K4x4, S1, P1)-BN-Leaky Relu (32, 32, 512) (32, 32, 512)
    CONV(N512, K4x4, S1, P1)-BN-Leaky Relu (32, 32, 512) (32, 32, 512)
    CONV(N512, K4x4, S1, P1)-BN-Leaky Relu (32, 32, 512) (32, 32, 512)
    CONV(N512, K4x4, S1, P1)-BN-Leaky Relu (32, 32, 512) (32, 32, 512)
    CONV(N512, K4x4, S1, P1)-BN-Leaky Relu (32, 32, 512) (32, 32, 512)
    Up-Sample DECONV(N256, K4x4, S2, P1)-BN-Leaky Relu (32, 32, 512) (64, 64, 256)
    DECONV(N128, K4x4, S2, P1)-BN-Leaky Relu (64, 64, 256) (128, 128, 128)
    DECONV(N64, K4x4, S1, P3)-BN-Leaky Relu (128, 128, 128) (256, 256, 64)
    CONV(N3, K4x4, S1, P3)-BN-Leaky Relu (256, 256, 64) (256, 256,3)
    下载: 导出CSV

    表  2  判别器网络结构

    Table  2  Discriminator network architecture

    网络 层次 输入 输出
    Input Layer CONV(N64, K3x3, S1, P1)-Leaky Relu (256, 256, 3) (256, 256, 64)
    CONV BLOCK (256, 256, 64) (256, 256, 64)
    CONV BLOCK (256, 256, 64) (128, 128 128)
    CONV BLOCK (128, 128 128) (64, 64 256)
    Inner Layer HIDDEN LAYER (64, 64 256) (32, 32 512)
    HIDDEN LAYER (32, 32 512) (32, 32 64)
    DECONV BLOCK (32, 32 64) (64, 64 64)
    DECONV BLOCK (64, 64 64) (128, 128, 64)
    DECONV BLOCK (128, 128, 64) (256, 256, 64)
    Output layer CONV(N64, K3x3, S1, P1)-Leaky Relu (256, 256, 64) (256, 256, 3)
    CONV(N64, K3x3, S1, P1)-Leaky Relu
    CONV(N3, K3x3, S1, P1)-Tanh
    下载: 导出CSV

    表  3  各模型Inception Score统计表, 该指标越高表明模型性能越好

    Table  3  Inception Score of different models. For this metric, higher is better.

    模型 DeepFashion Dayton ICG Lab6
    all classes Top-1 class Top-5 class all classes Top-1 class Top-5 class all classes Top-1 class Top-5 class
    Pix2Pix 3.37 2.23 3.44 2.85 1.93 2.91 2.54 1.69 2.49
    X-Fork 3.45 2.57 3.56 3.07 2.24 3.09 4.65 2.14 3.85
    X-Seq 3.83 2.68 4.02 2.74 2.13 2.77 4.51 2.05 3.66
    VariGAN 3.79 2.71 3.94 2.77 2.19 2.79 4.66 2.15 3.72
    SelectionGAN 3.81 2.72 3.91 3.06 2.27 3.13 5.64 2.52 4.77
    ViewGAN 4.10 2.95 4.32 3.18 2.27 3.36 5.92 2.71 4.91
    Real Data 4.88 3.31 5.00 3.83 2.58 3.92 6.46 2.86 5.47
    下载: 导出CSV

    表  4  各模型Top-k预测准确率统计表, 该指标越高表明模型性能越好

    Table  4  Accuracies of different models. For this metric, higher is better.

    模型 DeepFashion Dayton ICG Lab6
    Top-1 class Top-5 class Top-1 class Top-5 class Top-1 class Top-5 class
    Pix2Pix 7.34 9.28 25.79 32.68 6.80 9.15 23.55 27.00 1.33 1.62 5.43 6.79
    X-Fork 20.68 31.35 50.45 64.12 30.00 48.68 61.57 78.84 5.94 10.36 20.83 30.45
    X-Seq 16.03 24.31 42.97 54.52 30.16 49.85 62.59 80.70 4.87 8.94 17.13 24.47
    VariGAN 25.67 31.43 55.52 63.70 32.21 52.69 67.95 84.31 10.44 20.49 33.45 41.62
    SelectionGAN 41.57 64.55 72.30 88.65 42.11 68.12 77.74 92.89 28.35 54.67 62.91 76.44
    ViewGAN 65.73 95.77 91.65 98.21 69.39 89.88 93.47 98.78 58.97 83.20 88.74 93.25
    下载: 导出CSV

    表  5  各模型SSIM, PSNR, SD和速度统计表, 其中FPS表示测试时每秒处理的图像数量, 所有指标得分越高表明模型性能越好

    Table  5  SSIM, PSNR, SD of different models. FPS is the number of images processed per second during testing. For all metrics, higher is better.

    模型 DeepFashion Dayton ICG Lab6 速度(FPS)
    SSIM PSNR SD SSIM PSNR SD SSIM PSNR SD
    Pix2Pix 0.39 17.67 18.55 0.42 17.63 19.28 0.23 15.71 16.59 166(±)5
    X-Fork 0.45 19.07 18.67 0.50 19.89 19.45 0.27 16.38 17.35 87(±)7
    X-Seq 0.42 18.82 18.44 0.50 20.28 19.53 0.28 16.38 17.27 75(±)3
    VariGAN 0.57 20.14 18.79 0.52 21.57 19.77 0.45 17.58 17.89 70(±)5
    SelectionGAN 0.53 23.15 19.64 0.59 23.89 20.02 0.61 26.67 19.76 66(±)6
    ViewGAN 0.70 26.47 21.63 0.74 25.97 21.37 0.80 28.61 21.77 62(±)2
    下载: 导出CSV

    表  6  最小数据量实验结果

    Table  6  Minimum training data experimental results

    数据量(张) SSIM PSNR SD
    6000(100%) 0.70 26.47 21.63
    5400(90%) 0.68 26.08 20.95
    4800(80%) 0.66 24.97 20.31
    4200(70%) 0.59 23.68 20.00
    3600(60%) 0.51 21.90 18.89
    下载: 导出CSV

    表  7  消融分析实验结果

    Table  7  Ablations study of the proposed ViewGAN

    模型 结构 SSIM PSNR SD
    A Pix2Pix 0.46 19.66 18.89
    B A+由粗到精生成方法 0.53 22.90 19.31
    C B+混合生成对抗网络 0.60 23.77 20.03
    D C+类内损失 0.60 23.80 20.11
    E D+惩罚机制 0.70 26.47 21.63
    下载: 导出CSV
  • [1] Zhu X, Yin Z, Shi J, et al. Generative adversarial frontal view to bird view synthesis. In: 2018 International Conference on 3D Vision (3DV). Verona, Italy: IEEE, 2018: 454−463.
    [2] Regmi K, Borji A. Cross-view image synthesis using conditional gans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah: IEEE, 2018: 3501−3510.
    [3] Zhai M, Bessinger Z, Workman S, et al. Predicting ground-level scene layout from aerial imagery. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 867−875.
    [4] Zhao B, Wu X, Cheng Z Q, et al. Multi-view image generation from a single-view. In: 2018 ACM Multimedia Conference on Multimedia Conference. Seoul, Republic of Korea: ACM, 2018: 383−391.
    [5] Tang H, Xu D, Sebe N, et al. Multi-channel attention selection gan with cascaded semantic guidance for cross-view image translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE, 2019: 2417−2426.
    [6] D. P. Kingma and M. Welling. Auto-encoding variational bayes. In: International Conference on Learning Representations. Banff, Canada, 2014.
    [7] Park E, Yang J, Yumer E, et al. Transformation-grounded image generation network for novel 3d view synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 3500−3509.
    [8] Choy C B, Xu D, Gwak J Y, et al. 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In: European conference on computer vision. Springer, Cham: Springer, Cham, 2016: 628−644.
    [9] 王坤峰, 苟超, 段艳杰, 林懿伦, 郑心湖, 王飞跃. 生成式对抗网络GAN的研究进展与展望. 自动化学报, 2017, 43(3): 321−332 doi: 10.16383/j.aas.2017.y000003

    WANG Kun-Feng, GOU Chao, DUAN Yan-Jie, LIN Yi-Lun, ZHENG Xin-Hu, WANG Fei-Yue. Generative Adversarial Networks: The State of the Art and Beyond. ACTA AUTOMATICA SINICA, 2017, 43(3): 321−332 doi: 10.16383/j.aas.2017.y000003
    [10] Browne C B, Powley E, Whitehouse D, et al. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in games, 2012, 4(1): 1−43 doi: 10.1109/TCIAIG.2012.2186810
    [11] Srivastava A, Valkov L, Russell C, et al. Veegan: Reducing mode collapse in gans using implicit variational learning. In: Advances in Neural Information Processing Systems. Long Beach, CA, USA: Curran Associates, Inc., 2017: 3308−3318.
    [12] Isola P, Zhu J Y, Zhou T, et al. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Honolulu, HI, USA: IEEE, 2017: 1125−1134.
    [13] Yan X, Yang J, Sohn K, et al. Attribute2image: Conditional image generation from visual attributes. In: European Conference on Computer Vision. Amsterdam, Netherlands: Springer, Cham, 2016: 776−791.
    [14] Gregor K, Danihelka I, Graves A, et al. Draw: A recurrent neural network for image generation. In: International Conference on Machine Learning. Lille, France: PMLR, 2015.
    [15] 唐贤伦, 杜一铭, 刘雨微, 李佳歆, 马艺玮. 基于条件深度卷积生成对抗网络的图像识别方法. 自动化学报, 2018, 44(5): 855−864 doi: 10.16383/j.aas.2018.c170470

    TANG Xian-Lun, DU Yi-Ming, LIU Yu-Wei, LI Jia-Xin, MA Yi-Wei. Image Recognition With Conditional Deep Convolutional Generative Adversarial Networks. ACTA AUTOMATICA SINICA, 2018, 44(5): 855−864 doi: 10.16383/j.aas.2018.c170470
    [16] Dumoulin V, Belghazi I, Poole B, et al. Adversarially learned inference.[Online], available: https://arxiv.gg363.site/abs/1606.00704, February 21, 2017
    [17] Chen X, Duan Y, Houthooft R, et al. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In: Advances in neural information processing systems. Centre Convencions Internacional Barcelona, Barcelona: Curran Associates, Inc., 2016: 2172−2180.
    [18] Zhang H, Xu T, Li H, et al. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 5907−5915.
    [19] Johnson J, Gupta A, Fei-Fei L. Image generation from scene graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018: 1219−1228.
    [20] Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision. Venice, Italy: IEEE, 2017: 2223−2232.
    [21] Choi Y, Choi M, Kim M, et al. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018: 8789−8797.
    [22] Dosovitskiy A, Springenberg J T, Tatarchenko M, et al. Learning to generate chairs, tables and cars with convolutional networks. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(4): 692−705
    [23] Wu J, Xue T, Lim J J, et al. Single image 3d interpreter network. In: European Conference on Computer Vision. Amsterdam, Netherlands: Springer, Cham, 2016: 365−382.
    [24] Salimans T, Goodfellow I, Zaremba W, et al. Improved techniques for training gans. In: Advances in neural information processing systems. Centre Convencions Internacional Barcelona, Barcelona: Curran Associates, Inc., 2016: 2234−2242.
    [25] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention. Munich, Germany: Springer, Cham, 2015: 234−241.
    [26] Liu Z, Luo P, Qiu S, et al. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Las Vegas, NV, USA: IEEE, 2016: 1096−1104.
    [27] Vo N N, Hays J. Localizing and orienting street views using overhead imagery. In: European conference on computer vision. Amsterdam, Netherlands: Springer, Cham, 2016: 494−509.
    [28] Possegger H, Sternig S, Mauthner T, et al. Robust real-time tracking of multiple objects by volumetric mass densities. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE, 2013: 2395−2402.
    [29] Deng J, Dong W, Socher R, et al. Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. Miami, FL, USA: IEEE, 2009: 248−255.
    [30] Zhou B, Lapedriza A, Khosla A, et al. Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence, 2017, 40(6): 1452−1464
    [31] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. Harrahs and Harveys, Lake Tahoe: Curran Associates, Inc., 2012: 1097−1105.
    [32] Yamaguchi K, Hadi Kiapour M, Berg T L. Paper doll parsing: Retrieving similar styles to parse clothing items. In: Proceedings of the IEEE international conference on computer vision. Sydney, Australia: IEEE, 2013: 3519−3526.
    [33] Ledig C, Theis L, Huszár F, et al. Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Honolulu, HI, USA: IEEE, 2017: 4681−4690.
    [34] 唐祎玲, 江顺亮, 徐少平, 刘婷云, 李崇禧. 基于眼优势的非对称失真立体图像质量评价. 自动化学报, 2019, 45(11): 2092−2106 doi: 10.16383/j.aas.c190124

    TANG Yi-Ling, JIANG Shun-Liang, XU Shao-Ping, LIU Ting-Yun, LI Chong-Xi. Asymmetrically Distorted Stereoscopic Image Quality Assessment Based on Ocular Dominance. ACTA AUTOMATICA SINICA, 2019, 45(11): 2092−2106 doi: 10.16383/j.aas.c190124
    [35] Mathieu M, Couprie C, LeCun Y. Deep multi-scale video prediction beyond mean square error.[Online], available: https://arxiv.gg363.site/abs/1511.05440, February 26, 2016
  • 加载中
计量
  • 文章访问数:  186
  • HTML全文浏览量:  57
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-10-25
  • 录用日期:  2020-02-23

目录

    /

    返回文章
    返回