一种基于谱归一化的两阶段堆叠结构生成对抗网络的文本生成图像模型

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (06): 1083-1089.

一种基于谱归一化的两阶段堆叠结构生成对抗网络的文本生成图像模型

王霞,徐慧英,朱信忠

(浙江师范大学数学与计算机科学学院,浙江金华 321004)

收稿日期:2021-06-07 修回日期:2021-07-13 接受日期:2022-06-25 出版日期:2022-06-25 发布日期:2022-06-17
基金资助:
国家自然科学基金（61976196）；浙江省万人计划“杰出人才”项目（2018R51001）;浙江省自然科学基金（LZ22F030003）

A text-to-image model based on the two-phase stacked generative confrontation network with spectral normalization

WANG Xia,XU Hui-ying,ZHU Xin-zhong

（College of Mathematics and Computer Science,Zhejiang Normal University,Jinhua 321004,China）

Received:2021-06-07 Revised:2021-07-13 Accepted:2022-06-25 Online:2022-06-25 Published:2022-06-17

摘要/Abstract

摘要： 文本生成图像是机器学习领域非常具有挑战性的任务，虽然目前已经有了很大突破，但仍然存在模型训练不稳定以及梯度消失等问题。针对这些不足，在堆叠生成对抗网络（StackGAN）基础上，提出一种结合谱归一化与感知损失函数的文本生成图像模型。首先，该模型将谱归一化运用到判别器网络中，将每层网络梯度限制在固定范围内，相对减缓判别器网络的收敛速度，从而提高网络训练的稳定性；其次，将感知损失函数添加到生成器网络中，增强文本语义与图像内容的一致性。使用Inception score评估所提模型生成图像的质量。实验结果表明，该模型与原始StackGAN相比，具有更好的稳定性且生成图像更加逼真。

关键词: 深度学习；生成对抗网络；文本生成图像；谱归一化, 感知损失函数

Abstract: Generating images from text is a challenge task in machine learning community. Although significant success has been achieved so far, problems such as unstable network training and disappear- ing gradients still exist. In response to the above shortcomings, based on the stacked generative confrontation network model (StackGAN), this paper proposes a text-to-image generation method that combines spectral normalization and perceptual loss function. Firstly, the network model applies spectral normalization to the discriminator, restricts the gradient of each layer of the network to a fixed range, slows down the convergence speed of the discriminator, and hence improves the stability of network training. Secondly, the perceptual loss function is added to the generator network to enhance the consistency between the text content and the generated image. The network model uses Inception scores to evaluate the quality of the generated images. The experimental results show that, compared with the original StackGAN, the network model has better stability and generates clearer images.

Key words: deep learning, generative adversarial network, text-to-image generation, spectral normalization, perceptual loss function

王霞, 徐慧英, 朱信忠. 一种基于谱归一化的两阶段堆叠结构生成对抗网络的文本生成图像模型[J]. 计算机工程与科学, 2022, 44(06): 1083-1089.

WANG Xia, XU Hui-ying, ZHU Xin-zhong. A text-to-image model based on the two-phase stacked generative confrontation network with spectral normalization[J]. Computer Engineering & Science, 2022, 44(06): 1083-1089.

编辑推荐

Metrics

阅读次数

全文

309

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	309

来源	本网站	其他网站

次数	239	70
比例	77%	23%

摘要

190

最新录用	在线预览	正式出版

0	0	190

	来源	本网站

	次数	190
	比例	100%