融合自注意力机制和相对鉴别的无监督图像翻译

林泓; 任硕; 杨益; 张杨忆

doi:10.16383/j.aas.c190074

融合自注意力机制和相对鉴别的无监督图像翻译

doi: 10.16383/j.aas.c190074 cstr: 32138.14.j.aas.c190074

林泓^1,,
任硕^1,,
杨益^1, ,,
张杨忆^1,

1.
武汉理工大学计算机科学与技术学院武汉 430063

详细信息

作者简介:
林泓武汉理工大学计算机科学与技术学院副教授.主要研究方向为图像处理, 数据挖掘, 计算机语言与编译技术.E-mail:linhong@whut.edu.cn

任硕武汉理工大学硕士研究生.主要研究方向为计算机视觉和模式识别. E-mail: rensho555@126.com

张杨忆武汉理工大学硕士研究生.主要研究方向为计算机视觉和模式识别. E-mail: whutzyy95@163.com

通讯作者:
杨益武汉理工大学计算机学院讲师.主要研究方向为图像处理和模式识别.本文通信作者.E-mail: yang_yi@whut.edu.cn

计量
- 文章访问数: 878
- HTML全文浏览量: 845
- PDF下载量: 196
- 被引次数: 0
出版历程
- 收稿日期: 2019-01-29
- 录用日期: 2019-05-19
- 刊出日期: 2021-10-13

Unsupervised Image-to-Image Translation With Self-Attention and Relativistic Discriminator Adversarial Networks

LIN Hong^1
,,
REN Shuo^1
,,
YANG Yi^{1
, ,},
ZHANG Yang-Yi^1
,

1.
College of Computer Science and Technology, Wuhan University of Technology, Wuhan 430063

More Information

Author Bio:
LIN Hong Associate professor at the College of Computer Science and Technology, Wuhan University of Technology. Her research interest covers image processing, data mining, computer language, and compilation technology

REN Shuo Master student at Wuhan University of Technology. His research interest covers computer vision and pattern recognition

ZHANG Yang-Yi Master student at Wuhan University of Technology. Her research interest covers computer vision and pattern recognition

Corresponding author: YANG Yi Lecturer at the College of Computer Science and Technology, Wuhan University of Technology. Her research interest covers image processing and pattern recognition. Corresponding author of this paper

摘要

摘要: 无监督图像翻译使用非配对训练数据能够完成图像中对象变换、季节转移、卫星与路网图相互转换等多种图像翻译任务.针对基于生成对抗网络(Generative adversarial network, GAN)的无监督图像翻译中训练过程不稳定、无关域改变较大而导致翻译图像细节模糊、真实性低的问题, 本文基于对偶学习提出一种融合自注意力机制和相对鉴别的无监督图像翻译方法.首先, 生成器引入自注意力机制加强图像生成过程中像素间远近距离的关联关系, 在低、高卷积层间增加跳跃连接, 降低无关图像域特征信息损失.其次, 判别器使用谱规范化防止因鉴别能力突变造成的梯度消失, 增强训练过程中整体模型的稳定性.最后, 在损失函数中基于循环重构增加自我重构一致性约束条件, 专注目标域的转变, 设计相对鉴别对抗损失指导生成器和判别器之间的零和博弈, 完成无监督的图像翻译.在Horse & Zebra、Summer & Winter以及AerialPhoto & Map数据集上的实验结果表明:相较于现有GAN的图像翻译方法, 本文能够建立更真实的图像域映射关系, 提高了生成图像的翻译质量.
- 图像翻译 /
- 对偶学习 /
- 生成对抗网络 /
- 自注意力机制 /
- 相对鉴别 /
- 无监督学习
Abstract: Unsupervised image-to-image translation using unpaired training data can accomplish a variety of image translation tasks such as object transformation, seasonal transfer, and satellite and map transformation. The image-to-image translation method based on generative adversarial network (GAN) has not been satisfying due to the following reasons, the training process is unstable and the irrelevant domain changes greatly, the output images are blurred in detail and low in authenticity. This paper proposes an unsupervised image-to-image translation method with self-attention and relativistic discriminator adversarial networks based on dual learning. Firstly, in the generator, self-attention mechanism is designed to build long-short-range dependency for image generation tasks. Skip-connection between low and high convolution layers help reduce the loss of feature information in irrelevant image domain. Then, in the discriminator, spectral normalization is used to prevent the gradient disappearing caused by the mutation of the discrimination ability to enhance training stability. Finally, in the loss function, the self-reconstruction consistency is added on the basis of loop reconstruction to focus on target image domain change. The relativistic adversarial loss is designed to guide the zero-sum game between generator and discriminator. The experimental results from the Horse & Zebra, Summer & Winter, and AerialPhoto & Map datasets demonstrate that compared with the current image translation methods, our method can establish a more realistic image domain mapping relationship and improve the translation quality of the generated image.
- Image-to-image translation /
- dual learning /
- generative adversarial networks (GAN) /
- self-attention /
- relativistic discriminator /
- unsupervised learning
Recommended by Associate Editor BAI Xiang
注释:

1) 本文责任编委白翔

HTML全文

图 1 模型整体结构

Fig. 1 The structure of model

下载: 全尺寸图片幻灯片

图 2 生成器网络

Fig. 2 Generator

下载: 全尺寸图片幻灯片

图 3 判别器网络

Fig. 3 Discriminator

下载: 全尺寸图片幻灯片

图 4 相对对抗收敛效果

Fig. 4 Relative discriminator convergence effect

下载: 全尺寸图片幻灯片

图 5 本文不同条件实验结果

Fig. 5 Experimental results under different conditions

下载: 全尺寸图片幻灯片

图 6 卫星图与路网图翻译效果对比

Fig. 6 Comparison of AerialPhoto & Map image translation

下载: 全尺寸图片幻灯片

图 7 多种方法翻译效果对比

Fig. 7 Comparison of multiple image translation methods

下载: 全尺寸图片幻灯片

表 1 生成器网络结构参数设置

Table 1 The parameter setting of generator

序号	区域划分	层类型	卷积核	步长	深度	归一化	激活函数
0	下采样	Convolution	$ 7 \times 7 $	1	64	IN	ReLU
1	下采样	Convolution	$ 3 \times 3 $	2	128	IN	ReLU
2	下采样	Convolution	$ 3 \times 3 $	2	256	IN	ReLU
3	中间区	Residual Block	$ 3 \times 3 $	1	256	IN	ReLU
4	中间区	Residual Block	$ 3 \times 3 $	1	256	IN	ReLU
5	中间区	Residual Block	$ 3 \times 3 $	1	256	IN	ReLU
6	中间区	Residual Block	$ 3 \times 3 $	1	256	IN	ReLU
7	中间区	Residual Block	$ 3 \times 3 $	1	256	IN	ReLU
8	中间区	Residual Block	$ 3 \times 3 $	1	256	IN	ReLU
9	上采样	Deconvlution	$ 3 \times 3 $	2	128	IN	ReLU
10	上采样	Self-Attention	–	–	–	–	–
11	上采样	Deconvlution	$ 3 \times 3 $	2	64	IN	ReLU
12	上采样	Convolution	$ 7 \times 7 $	1	3	–	Tanh

下载: 导出CSV

表 2 判别器网络结构参数设置

Table 2 The parameter setting of discriminator

序号	层类型	卷积核	步长	深度	归一化	激活函数
0	Convolution	$ 4 \times 4 $	2	64	–	LeakyReLU
1	Convolution	$ 4 \times 4 $	2	128	SN	LeakyReLU
2	Convolution	$ 4 \times 4 $	2	256	SN	LeakyReLU
3	Convolution	$ 4 \times 4 $	2	512	SN	LeakyReLU
4	Convolution	$ 4 \times 4 $	1	1	–	–

下载: 导出CSV

表 3 本文不同条件分类准确率

Table 3 CA under different conditions

数据集	真实图像	相对对抗	自注意力	自注意力+相对对抗
Horse&Zebra	0.985	0.849	0.862	0.873
Summer&Winter	0.827	0.665	0.714	0.752

下载: 导出CSV

表 4 用户调研评价(%)

Table 4 User study (%)

翻译任务	CycleGAN ^[7]	UNIT ^[8]	本文方法
Horse $ \rightarrow $ Zebra	10.6	13.3	76.1
Zebra $ \rightarrow $ Horse	7.6	6.6%	85.8
Summer $ \rightarrow $ Winter	24.6	19.3%	56.1
Winter $ \rightarrow $ Summer	15	11.7%	73.3

下载: 导出CSV

表 5 分类准确率对比

Table 5 Classification accuracy comparison

数据集	真实图像	CycleGAN ^[7]	UNIT ^[8]	本文方法
Horse&Zebra	0.985	0.850	0.789	0.873
Summer&Winter	0.827	0.644	0.591	0.752

下载: 导出CSV

表 6 感知距离对比

Table 6 Perceptual distance comparison

翻译任务	真实图像	CycleGAN ^[7]	UNIT ^[8]	本文方法
Horse $ \rightarrow $ Zebra	1.177	1.133	1.054	1.137
Zebra $ \rightarrow $ Horse	1.198	1.141	1.056	1.147
Summer $ \rightarrow $ Winter	1.824	1.189	1.153	1.211
Winter $ \rightarrow $ Summer	1.272	1.223	1.209	1.259

下载: 导出CSV

参考文献(20)

[1]	Isola P, Zhu J Y, Zhou T H, Efros A A. Image-to-image translation with conditional adversarial networks. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, Hawaii, USA: IEEE, 2017. 1125-1134
[2]	Xu F R, Ma B P, Chang H, Shan S G, Chen X L. Style transfer with adversarial learning for cross-dataset person re-identification. In: Proceedings of the 2018 Asian Conference on Computer Vision. Perth, Australia: Springer, 2018.
[3]	Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, WardeFarley D, Ozair S, et al. Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press, 2014. 2672-2680
[4]	Huang H, Yu P S, Wang C. An introduction to image synthesis with generative adversarial nets. arXiv Preprint arXiv: 1803.04469, 2018.
[5]	Mirza M, Osindero S. Conditional generative adversarial nets. arXiv preprint arXiv: 1411.1784, 2014.
[6]	Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: Proceedings of the 18th International Conference on Medical Image Computing and Computer-assisted Intervention. Munich, Germany: Springer, 2015. 234-241
[7]	Zhu J Y, Park T, Isola P, Efros A A. Unpaired image-toimage translation using cycle-consistent adversarial networks. In: Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 2223 -2232
[8]	Liu M Y, Breuel T, Kautz J. Unsupervised image-to-image translation networks. In: Proceedings of the 31st Conference on Neural Information Processing Systems. Long Beach, California, USA: MIT Press, 2017. 700-708
[9]	Miyato T, Kataoka T, Koyama M, Yoshida Y. Spectral normalization for generative adversarial networks. arXiv Preprint arXiv: 1802.05957, 2018.
[10]	He D, Xia Y C, Qin T, Wang L W, Yu N H, Liu T Y, et al. Dual learning for machine translation. In: Proceedings of the 30th Conference on Neural Information Processing Systems. Barcelona, Spain: NIPS, 2016. 820-828
[11]	Johnson J, Alahi A, Li F F. Perceptual losses for real-time style transfer and super-resolution. In: Proceedings of the 2016 European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016. 694-711
[12]	He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016. 770-778
[13]	Ulyanov D, Vedaldi A, Lempitsky V. Instance normalization: The missing ingredient for fast stylization. arXiv Preprint arXiv: 1607.08022, 2016.
[14]	Jolicoeur-Martineau A. The relativistic discriminator: A key element missing from standard GAN. arXiv Preprint arXiv: 1807.00734, 2018.
[15]	Mao X D, Li Q, Xie H R, Lau R Y, Wang Z, Smolley S P. Least squares generative adversarial networks. In: Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 2794-2802
[16]	Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3): 211-252
[17]	Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Proceedings of the 31st Conference on Neural Information Processing Systems. Long Beach, California, USA: MIT Press, 2017. 6626-6637
[18]	Kingma D P, Ba J. Adam: A method for stochastic optimization. arXiv Preprint arXiv: 1412.6980, 2014.
[19]	Chollet F. Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, Hawaii, USA: IEEE, 2017. 1251-1258
[20]	Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv Preprint arXiv: 1409.1556, 2014.

施引文献

资源附件(0)

访问统计

图(7) / 表(6)

计量

文章访问数: 878
HTML全文浏览量: 845
PDF下载量: 196
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

融合自注意力机制和相对鉴别的无监督图像翻译

doi: 10.16383/j.aas.c190074 cstr: 32138.14.j.aas.c190074

通讯作者:
杨益武汉理工大学计算机学院讲师.主要研究方向为图像处理和模式识别.本文通信作者.E-mail: yang_yi@whut.edu.cn

计量

Unsupervised Image-to-Image Translation With Self-Attention and Relativistic Discriminator Adversarial Networks

Corresponding author: YANG Yi Lecturer at the College of Computer Science and Technology, Wuhan University of Technology. Her research interest covers image processing and pattern recognition. Corresponding author of this paper

计量

目录

留言板

融合自注意力机制和相对鉴别的无监督图像翻译

doi: 10.16383/j.aas.c190074 cstr: 32138.14.j.aas.c190074

通讯作者: 杨益 武汉理工大学计算机学院讲师.主要研究方向为图像处理和模式识别.本文通信作者.E-mail: yang_yi@whut.edu.cn

计量

出版历程

Unsupervised Image-to-Image Translation With Self-Attention and Relativistic Discriminator Adversarial Networks

Corresponding author: YANG Yi Lecturer at the College of Computer Science and Technology, Wuhan University of Technology. Her research interest covers image processing and pattern recognition. Corresponding author of this paper

计量

出版历程

目录

通讯作者:
杨益武汉理工大学计算机学院讲师.主要研究方向为图像处理和模式识别.本文通信作者.E-mail: yang_yi@whut.edu.cn