• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2025, Vol. 47 ›› Issue (5): 894-901.

• Graphics and Images • Previous Articles     Next Articles

Cross-modal image emotion perception captioning based on generative adversarial network

YANG Chunmiao,WANG Yang,HAN Liying,SUN Hebin   

  1. (School of Electronic and Information Engineering,Hebei University of Technology,Tianjin 300401,China)
  • Received:2023-12-26 Revised:2024-04-16 Online:2025-05-25 Published:2025-05-27

Abstract: Image captioning is a cross-modal task, which aims to produce texts conforming to the image content based on visual information. Although some achievements have been made in image caption- ing, it still has  improved space in the aspects of fine-grained affective semantic feature capture and the emotional delicacy of descriptions. Addressing this problem, a model is proposed, which based on generative adversarial network  to generate aspect-level emotional language descriptions. With the codec structure integrating the two-modal attention mechanism as the generator and the convolutional neural network as the discriminator, the accuracy of the model in cross-modal emotion matching and the reliability of generating emotion statements are improved. Transfer learning and RMSProp optimization algorithm are introduced to improve the interpretability of the model. Finally, the experiment is carried out on the MSCOCO and SentiCap datasets,the model exhibits excellent convergence performance and attains a high accuracy rate.

Key words: generative adversarial network, image emotion captioning, RMSProp optimization algorithm