• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2026, Vol. 48 ›› Issue (2): 341-352.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于全景语义和多层次特征融合的方面级多模态情感分析

张洋,胡慧君,刘茂福   

  1. (1.武汉科技大学计算机科学与技术学院,湖北 武汉 430065;
    2.湖北省智能信息处理与实时工业系统重点实验室,湖北 武汉 430065)

  • 出版日期:2026-02-25 发布日期:2026-03-10
  • 基金资助:
    “十四五”湖北省优势特色学科(群)项目(2023D0302);数字金融创新湖北省重点实验室开放基金(DFIK2024Y06)

Aspect-based multimodal sentiment analysis based on panoramic semantics and multi-level feature fusion

ZHANG Yang,HU Huijun,LIU Maofu   

  1. (1.School of Computer Science and Technology,Wuhan University of Science and Technology,Wuhan 430065;
    2.Hubei Province Key Laboratory of Intelligent Information Processing 
    and Real-Time Industrial System,Wuhan 430065,China)
  • Online:2026-02-25 Published:2026-03-10

摘要: 目前,方面级多模态情感分析在相关任务中面临中文数据集匮乏与类别分布不均衡的问题。传统模型在处理情感信息时常忽视词语的局部依赖性,导致全局语义理解不足,难以准确定位情感信息。此外,多模态信息融合过程中难以有效筛选和过滤无关信息,影响情感分类的准确性。为解决这些问题,构建了高质量多模态中文数据集WAMSA,并提出了一种基于全景语义和多层次特征融合的方面级多模态情感分析模型PSMFF。该模型通过全景语义网络模块,将文本特征与语义扩展信息相结合,利用GCN和图编码器捕捉细粒度和粗粒度的语义特征;多层次特征融合模块则通过局部引导提取相关图像特征,利用Transformer增强后,再与文本特征进行全局引导融合,生成丰富的多模态表征。实验结果表明,PSMFF模型在3个数据集上的表现优于多种基线模型。


关键词: 方面级多模态情感分析, WAMSA数据集, 全景语义网络, 多层次特征融合

Abstract: Currently, aspect-based multimodal sentiment analysis faces challenges such as the scarcity of Chinese datasets and uneven distribution of categories in related tasks. Traditional models often ignore the local dependencies of words when processing sentiment information, which leads to insufficient global semantic understanding and makes it difficult to accurately localize the sentiment information. In addition, it is challenging to effectively screen and filter irrelevant information during multimodal information fusion, which affects the accuracy of sentiment classification. To solve these problems, this paper constructs a high-quality multimodal Chinese dataset named WAMSA and proposes an aspect-based multimodal sentiment analysis model based on panoramic semantics and multi-level feature fusion (PSMFF). This model employs a panoramic semantic network module to integrate textual features with semantic expansion information, utilizing GCN and graph encoders to capture fine-grained and coarse-grained semantic features. The multi-level feature fusion module extracts relevant image features through local guidance, enhances them via  a Transformer, and subsequently fuses them with textual features through global guidance to generate rich multimodal representations. Experimental results demonstrate that the PSMFF model outperforms multiple baseline models on 3 datasets. 

Key words: aspect-based multimodal sentiment analysis, Weibo aspect-level multimodal sentiment analysis(WAMSA) dataset, panoramic semantic network, multi-level feature fusion