• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2025, Vol. 47 ›› Issue (11): 2056-2066.

• Artificial Intelligence and Data Mining • Previous Articles     Next Articles

Imagetext emotion classification based on visual feature enhancement and bidirectional interaction fusion

WANG Luyao,HU Huijun,LIU Maofu   

  1. (1.School of Computer Science and Technology,Wuhan University of Science and Technology,Wuhan 430065;
    2.Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System,Wuhan 430065,China)
  • Received:2024-02-27 Revised:2024-07-13 Online:2025-11-25 Published:2025-12-08

Abstract: Multimodal sentiment analysis is increasingly receiving widespread attention, with the aim of utilizing multimodal information such as text and images to achieve emotion prediction. Compared to text, the visual modality, as an auxiliary modality, may contain more redundant or confounding information unrelated to emotions, and existing research does not fully consider the interaction and complementarity between multiple perceptual modalities. To address these issues, an imagetext emotion classification model based on visual feature enhancement and bidirectional interactive fusion (VFEBIF) is proposed. In this approach, the fine-grained visual feature enhancement module utilizes structured knowledge from scene graphs and filtering techniques based on CLIP to extract keywords from the text related to visual semantics, thereby enhancing local visual features. Additionally, the bidirectional interactive fusion module implements inter-modal interaction in parallel, and fuses multimodal features to thoroughly explore complementary information between text and image, thus achieving emotion classification. Experiments on two public datasets, TumEmo and MVSA-Single, demonstrate that the VFEBIF method outperforms most existing approaches and can effectively improve the performance of sentiment classification.

Key words: multimodal sentiment analysis, imagetext emotion classification, visual feature enhancement, bidirectional interactive fusion