• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (03): 512-519.

• 人工智能与数据挖掘 • 上一篇    下一篇

一种融合字词信息的中文情感分析模型

杨春霞1,2,3,姚思诚1,2,3,宋金剑1,2,3   

  1.  (1.南京信息工程大学自动化学院,江苏 南京 210044;2.江苏省大数据分析技术重点实验室,江苏 南京 210044;
    3.江苏省大气环境与装备技术协同创新中心,江苏 南京 210044)
  • 收稿日期:2021-07-05 修回日期:2021-11-21 接受日期:2023-03-25 出版日期:2023-03-25 发布日期:2023-03-23
  • 基金资助:
    国家自然科学基金(61273229)

A Chinese sentiment analysis model combining character and word information

YANG Chun-xia1,2,3,YAO Si-cheng1,2,3,SONG Jin-jian1,2,3    

  1. (1.School of Automation,Nanjing University of Information Science & Technology,Nanjing 210044;
    2.Jiangsu Key Laboratory of Big Data Analysis Technology,Nanjing 210044;
    3.Jiangsu Collaborative Innovation Center of Atmospheric Environment 
    and Equipment Technology,Nanjing 210044,China)
  • Received:2021-07-05 Revised:2021-11-21 Accepted:2023-03-25 Online:2023-03-25 Published:2023-03-23

摘要: 中文情感分析模型的文本表示通常只采用词粒度信息,这会导致模型在特征提取时丧失字粒度的特性,同时常用的分词方法的分词结果过于精简,也一定程度上限制了文本表示的丰富度。对此,提出了一种融合字粒度特征与词粒度特征的中文情感分析模型,采用全模式分词得到更丰富的词序列,经词嵌入后将词向量输入Bi-LSTM中提取全文的语义信息,并将隐层语义表示与对应字向量进行初步融合,增强词级信息的鲁棒性;另一方面将字向量输入多窗口卷积,捕捉更细粒度的字级特征信息。最后将字词粒度特征进一步融合后输入分类器得到情感分类结果,在2个公开数据集上的性能测试结果表明,该模型相比同类模型有更好的分类性能。

关键词: 中文情感分析;全模式分词;多粒度融合 ,  

Abstract: Chinese sentiment analysis models usually only use word granularity information as text representation, which will cause that the model loses the characteristics of word granularity during feature extraction. At the same time, the commonly used word segmentation models are too concise in word segmentation results, which limits the richness of text representation to a certain extent. In this regard, a Chinese sentiment analysis model that combines character granularity features and word granularity features is proposed. The full pattern word segmentation is used to obtain a richer word sequence. After word embedding, the word vector is input into Bi-LSTM to extract the semantic information of the full text. The hidden semantic representation and the corresponding word vector are initially fused to enhance the robustness of word-level information. On the other hand, the word vector is input into multi-window convolution to capture more fine-grained word-level feature information. Finally, the word granularity features are further fused and input into the classifier to obtain the sentiment classification results. The performance test results on two public data sets show that this model improves the classification performance compared with similar models.

Key words: Chinese sentiment analysis, full pattern word segmentation, multi-granularity fusion