• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (02): 338-345.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于情感语义对抗的跨语言情感分类模型

赵亚丽1,2,余正涛1,2,郭军军1,2,高盛祥1,2,相艳1,2   

  1. (1.昆明理工大学信息工程与自动化学院,云南 昆明 650500;2.云南省人工智能重点实验室,云南 昆明 650500)

  • 收稿日期:2021-04-26 修回日期:2021-09-05 接受日期:2023-02-25 出版日期:2023-02-25 发布日期:2023-02-16
  • 基金资助:
    国家自然科学基金(61972186,61762056,61732005,61761026);国家重点研发计划(2018YFC0830105,2018YFC0830101,2018YFC0830100);云南省高新技术产业专项(201606);云南省重大科技专项(202002AD080001-5);云南省基础研究计划(202001AT070047)

A cross-language sentiment classification model based on emotional semantic confrontation

ZHAO Ya-li 1,2,YU Zheng-tao1,2,GUO Jun-jun1,2 GAO Sheng-xiang1,2,XIANG Yan1,2   

  1. (1.Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500;
    2.Artificial Intelligence Key Laboratory of Yunnan Province,Kunming 650500,China)
  • Received:2021-04-26 Revised:2021-09-05 Accepted:2023-02-25 Online:2023-02-25 Published:2023-02-16

摘要: 传统的基于机器翻译的跨语言情感分类方法,由于受机器翻译性能影响,导致越南语等低资源语言的情感分类准确率较低。针对源语言和目标语言标记资源不平衡的问题,提出一种基于情感语义对抗的跨语言情感分类模型。首先,将句子和句子中情感词进行拼接,用卷积神经网络对拼接后的句子分别进行特征抽取,分别获得单语语义空间下的情感语义表征;其次,通过对抗网络,在双语情感语义空间将带标签数据与无标签数据的情感语义表征进行对齐;最后,将句子与情感词最显著的表征进行拼接,得到情感分类结果。基于汉英公共数据集和自主构建的汉越数据集的实验结果表明,所提模型相比跨语言情感分类主流模型,实现了双语情感语义对齐,可以有效提升越南语情感分类的准确率,且在差异性不同的语言对上也具有明显优势。

关键词: 情感语义表征, 双语词嵌入, 低资源语言, 跨语言情感分类

Abstract: Traditional cross-language sentiment classification methods based on machine translation are affected by the performance of machine translation, resulting in lower accuracy of sentiment classification in low-resource languages such as Vietnamese. Aiming at the problem of imbalance between source language and target language markup resources, this paper proposes a cross-language sentiment classification  model based on sentiment semantic confrontation. Firstly, the sentences and the emotional words in the sentences are spliced, and the spliced sentences are jointly represented by the convolutional neural network, and the emotional semantic representations in the monolingual semantic space are obtained respectively. Secondly, through the confrontation network, the emotional semantic representations of labeled data and unlabeled data are aligned in the bilingual emotional semantic space. Finally, the most significant representations of sentences and emotional words are spliced together to obtain the results of emotional orientation classification. The experimental results based on the Chinese-English public data set and the Chinese-Vietnamese data set we constructed show that, compared with the mainstream methods of cross-language sentiment classification, the proposed method achieves bilingual sentimental semantic alignment, and can effectively improve the accuracy of sentimental orientation analysis of Vietnamese. The proposed method has obvious advantages in different language pairs.


Key words: emotional semantic representation, bilingual word embedding, low-resource language;cross-language sentiment classification