• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (5): 894-901.

• 图形与图像 • 上一篇    下一篇

基于生成对抗网络的跨模态图像情感感知描述

杨春苗,王杨,韩力英,孙赫彬   

  1. (河北工业大学电子信息工程学院,天津 300401)
  • 收稿日期:2023-12-26 修回日期:2024-04-16 出版日期:2025-05-25 发布日期:2025-05-27
  • 基金资助:
    国家自然科学基金(62241103);河北省教育厅资助重点项目(ZD2020304);河北省引进留学人员资助项目(C20220316)

Cross-modal image emotion perception captioning based on generative adversarial network

YANG Chunmiao,WANG Yang,HAN Liying,SUN Hebin   

  1. (School of Electronic and Information Engineering,Hebei University of Technology,Tianjin 300401,China)
  • Received:2023-12-26 Revised:2024-04-16 Online:2025-05-25 Published:2025-05-27

摘要: 图像描述旨在根据视觉信息生成符合图像内容的文本,属于跨模态任务。尽管当前图像描述已取得一定成果,但在细粒度情感语义特征捕捉和描述文本情感细腻度等方面仍有提升空间。针对此问题,提出一种基于生成对抗网络生成方面级情感语言描述的模型。以融合双模态注意力机制的编解码结构为生成器、卷积神经网络为判别器,提升模型在跨模态情感匹配方面的准确性及生成情感语句的可靠性。引入迁移学习和RMSProp优化算法以增强模型的可解释性。最终,在MSCOCO与SentiCap数据集上进行了验证,模型收敛性良好并取到了较高的准确率。

关键词: 生成对抗网络, 图像情感描述, RMSprop优化算法

Abstract: Image captioning is a cross-modal task, which aims to produce texts conforming to the image content based on visual information. Although some achievements have been made in image caption- ing, it still has  improved space in the aspects of fine-grained affective semantic feature capture and the emotional delicacy of descriptions. Addressing this problem, a model is proposed, which based on generative adversarial network  to generate aspect-level emotional language descriptions. With the codec structure integrating the two-modal attention mechanism as the generator and the convolutional neural network as the discriminator, the accuracy of the model in cross-modal emotion matching and the reliability of generating emotion statements are improved. Transfer learning and RMSProp optimization algorithm are introduced to improve the interpretability of the model. Finally, the experiment is carried out on the MSCOCO and SentiCap datasets,the model exhibits excellent convergence performance and attains a high accuracy rate.

Key words: generative adversarial network, image emotion captioning, RMSProp optimization algorithm