• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2022, Vol. 44 ›› Issue (06): 1083-1089.

• Graphics and Images • Previous Articles     Next Articles

A text-to-image model based on the two-phase stacked generative confrontation network with spectral normalization

WANG Xia,XU Hui-ying,ZHU Xin-zhong   

  1. (College of Mathematics and Computer Science,Zhejiang Normal University,Jinhua 321004,China)
  • Received:2021-06-07 Revised:2021-07-13 Accepted:2022-06-25 Online:2022-06-25 Published:2022-06-17

Abstract: Generating images from text is a challenge task in machine learning community. Although significant success has been achieved so far, problems such as unstable network training and disappear- ing gradients still exist. In response to the above shortcomings, based on the stacked generative confrontation network model (StackGAN), this paper proposes a text-to-image generation method that combines spectral normalization and perceptual loss function. Firstly, the network model applies spectral normalization to the discriminator, restricts the gradient of each layer of the network to a fixed range, slows down the convergence speed of the discriminator, and hence improves the stability of network training. Secondly, the perceptual loss function is added to the generator network to enhance the consistency between the text content and the generated image. The network model uses Inception scores to evaluate the quality of the generated images. The experimental results show that, compared with the original StackGAN, the network model has better stability and generates clearer images.

Key words: deep learning, generative adversarial network, text-to-image generation, spectral normalization, perceptual loss function