Computer Engineering & Science ›› 2025, Vol. 47 ›› Issue (5): 894-901.
• Graphics and Images • Previous Articles Next Articles
YANG Chunmiao,WANG Yang,HAN Liying,SUN Hebin
Received:
Revised:
Online:
Published:
Abstract: Image captioning is a cross-modal task, which aims to produce texts conforming to the image content based on visual information. Although some achievements have been made in image caption- ing, it still has improved space in the aspects of fine-grained affective semantic feature capture and the emotional delicacy of descriptions. Addressing this problem, a model is proposed, which based on generative adversarial network to generate aspect-level emotional language descriptions. With the codec structure integrating the two-modal attention mechanism as the generator and the convolutional neural network as the discriminator, the accuracy of the model in cross-modal emotion matching and the reliability of generating emotion statements are improved. Transfer learning and RMSProp optimization algorithm are introduced to improve the interpretability of the model. Finally, the experiment is carried out on the MSCOCO and SentiCap datasets,the model exhibits excellent convergence performance and attains a high accuracy rate.
Key words: generative adversarial network, image emotion captioning, RMSProp optimization algorithm
YANG Chunmiao, WANG Yang, HAN Liying, SUN Hebin. Cross-modal image emotion perception captioning based on generative adversarial network[J]. Computer Engineering & Science, 2025, 47(5): 894-901.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://joces.nudt.edu.cn/EN/
http://joces.nudt.edu.cn/EN/Y2025/V47/I5/894