• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (05): 855-861.

• 图形与图像 • 上一篇    下一篇

判别增强的生成对抗模型在文本至图像生成中的研究与应用

谭红臣1,黄世华2,肖贺文3,于冰冰3,刘秀平3   

  1. (1.北京工业大学人工智能与自动化学院,北京 100124;2.香港理工大学计算机科学系,香港 999077;
    3.大连理工大学数学科学学院,辽宁 大连 116024)
  • 收稿日期:2021-11-11 修回日期:2022-01-07 接受日期:2022-05-25 出版日期:2022-05-25 发布日期:2022-05-24
  • 基金资助:
    国家自然科学基金(61976040,62172073);中国博士后科学基金委第70批博士后面上项目(2021M700303)

Discrimination-enhanced generative adversarial network in text-to-image generation

TAN Hong-chen1,HUANG Shi-hua2,XIAO He-wen3,YU Bing-bing3,LIU Xiu-ping3   

  1. (1.School of Artificial Intelligence and Automation,Beijing University of Technology,Beijing 100124;
    2.Department of Computer Science,The Hong Kong Polytechnic University,Hongkong 999077;
    3.School of Mathematical Sciences,Dalian University of Technology,Dalian 116024,China)
  • Received:2021-11-11 Revised:2022-01-07 Accepted:2022-05-25 Online:2022-05-25 Published:2022-05-24

摘要: 目前大部分基于生成对抗网络GAN的文本至图像生成算法着眼于设计不同模式的注意力生成模型,以提高图像细节的刻画与表达,但忽略了判别模型对局部关键语义的感知,以至于生成模型可能生成较差的图像细节“欺骗“判别模型。提出了判别语义增强的生成对抗网络DE-GAN模型,试图在判别模型中设计词汇-图像判别注意力模块,增强判别模型对关键语义的感知和捕捉能力,驱动生成模型生成高质量图像细节。实验结果显示,在CUB-Bird数据集上,DE-GAN在IS指标上达到了4.70,相比基准模型提升了4.2%,达到了较高的性能表现。

关键词: 文本至图像生成, 生成对抗网络, 注意力机制, 判别模型

Abstract: Based on Generative Adversarial Networks (GANs), most current text-to-image generation algorithms focus on designing different attention generation models to improve the characterization and expression of image details. However, they ignore the discriminators perception of key local semantics, so the generation models can easily generate poor image details to “fool” the discriminators. This paper designs a vocabulary-image discriminative attention module in the discriminators to enhance the discriminators ability to perceive and capture key semantics, and drive the generation model to generate high-quality image details. Therefore, a discrimination-enhanced generative adversarial model (DE-GAN) is proposed. The experimental results show that, on the CUB-Bird dataset, DE-GAN achieves 4.70 on the IS index, which is 4.2% higher than the baseline model and achieves high performance.

Key words: text-to-image generation, generative adversarial network, attention mechanism, discrimination model