基于情感语义对抗的跨语言情感分类模型

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (02): 338-345.

基于情感语义对抗的跨语言情感分类模型

赵亚丽1,2,余正涛1,2,郭军军1,2,高盛祥1,2,相艳1,2

(1.昆明理工大学信息工程与自动化学院，云南昆明 650500；2.云南省人工智能重点实验室，云南昆明 650500)

收稿日期:2021-04-26 修回日期:2021-09-05 接受日期:2023-02-25 出版日期:2023-02-25 发布日期:2023-02-16
基金资助:
国家自然科学基金（61972186,61762056,61732005,61761026）;国家重点研发计划（2018YFC0830105,2018YFC0830101,2018YFC0830100）;云南省高新技术产业专项（201606）；云南省重大科技专项（202002AD080001-5）；云南省基础研究计划（202001AT070047）

A cross-language sentiment classification model based on emotional semantic confrontation

ZHAO Ya-li 1,2,YU Zheng-tao1,2,GUO Jun-jun1,2 GAO Sheng-xiang1,2,XIANG Yan1,2

(1.Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500；
2.Artificial Intelligence Key Laboratory of Yunnan Province,Kunming 650500,China)

Received:2021-04-26 Revised:2021-09-05 Accepted:2023-02-25 Online:2023-02-25 Published:2023-02-16

摘要/Abstract

摘要： 传统的基于机器翻译的跨语言情感分类方法，由于受机器翻译性能影响，导致越南语等低资源语言的情感分类准确率较低。针对源语言和目标语言标记资源不平衡的问题，提出一种基于情感语义对抗的跨语言情感分类模型。首先，将句子和句子中情感词进行拼接，用卷积神经网络对拼接后的句子分别进行特征抽取，分别获得单语语义空间下的情感语义表征；其次，通过对抗网络，在双语情感语义空间将带标签数据与无标签数据的情感语义表征进行对齐；最后，将句子与情感词最显著的表征进行拼接，得到情感分类结果。基于汉英公共数据集和自主构建的汉越数据集的实验结果表明，所提模型相比跨语言情感分类主流模型，实现了双语情感语义对齐，可以有效提升越南语情感分类的准确率，且在差异性不同的语言对上也具有明显优势。

关键词: 情感语义表征, 双语词嵌入, 低资源语言, 跨语言情感分类

Abstract: Traditional cross-language sentiment classification methods based on machine translation are affected by the performance of machine translation, resulting in lower accuracy of sentiment classification in low-resource languages such as Vietnamese. Aiming at the problem of imbalance between source language and target language markup resources, this paper proposes a cross-language sentiment classification model based on sentiment semantic confrontation. Firstly, the sentences and the emotional words in the sentences are spliced, and the spliced sentences are jointly represented by the convolutional neural network, and the emotional semantic representations in the monolingual semantic space are obtained respectively. Secondly, through the confrontation network, the emotional semantic representations of labeled data and unlabeled data are aligned in the bilingual emotional semantic space. Finally, the most significant representations of sentences and emotional words are spliced together to obtain the results of emotional orientation classification. The experimental results based on the Chinese-English public data set and the Chinese-Vietnamese data set we constructed show that, compared with the mainstream methods of cross-language sentiment classification, the proposed method achieves bilingual sentimental semantic alignment, and can effectively improve the accuracy of sentimental orientation analysis of Vietnamese. The proposed method has obvious advantages in different language pairs.

Key words: emotional semantic representation, bilingual word embedding, low-resource language；cross-language sentiment classification

赵亚丽, 余正涛, 郭军军, 高盛祥, 相艳, . 基于情感语义对抗的跨语言情感分类模型[J]. 计算机工程与科学, 2023, 45(02): 338-345.

ZHAO Ya-li , YU Zheng-tao, GUO Jun-jun, GAO Sheng-xiang, XIANG Yan, . A cross-language sentiment classification model based on emotional semantic confrontation[J]. Computer Engineering & Science, 2023, 45(02): 338-345.

参考文献［18］

［1］	Hermann K M,Blunsom P.Multilingual models for compositional distributed semantics［C］∥Proc of the 52nd Annual Meeting of the Association for Computational Linguistics,2014:58-68.
［2］	Gouws S,Bengio Y,Corrado G.Bilbowa:Fast bilingual distributed representations without word alignments［C］∥Proc of the 32nd International Conference on Machine Learning,2015:748-756.
［3］	Zhou H, Chen L,Shi F,et al.Learning bilingual sentiment word embeddings for cross-language sentiment classification［C］∥Proc of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing,2015:430-440.
［4］	Zhou X,Wan X,Xiao J.Cross-lingual sentiment classification with bilingual document representation learning［C］∥Proc of the 54th Annual Meeting of the Association for Computational Linguistics,2016:1403-1412.
［5］	Mikolov T,Le Q V,Sutskever I.Exploiting similarities among languages for machine translation［J］.arXiv:1309.4168,2013.
［6］	Faruqui M,Dyer C.Improving vector space word representations using multilingual correlation［C］∥Proc of the European Chapter of the Association for Computational Linguistics,2014:462-471.
［7］	Lauly S,Larochelle H,Khapra M.An autoencoder approach to learning bilingual word representations［J］.Advances in Neural Information Processing Systems,2014,3:1853-1861.
［8］	Meng X,Wei F,Liu X,et al.Cross-lingual mixture model for sentiment classification［C］∥Proc of the Meeting of the Association for Computational Linguistics,2012:572-581.
［9］	Li Yu-qing,Li Xin,Han Xu,et al. A bilingual lexicon-based multi-class semantic orientation analysis for microblogs［J］.Acta Electronica Sinica,2016,44(9):2068-2073.（in Chinese）
［10］	Wang Z, Lee S Y M,Li S,et al.Emotion analysis in code- switching text with joint factor graph model［J］.IEEE/ACM Transactions on Audio Speech ＆ Language Processing,2017,25( 3):469-480.
［11］	Wang Kun-feng,Gou Chao,Duan Yan-jie,et al. Generative adversarial networks: The state of the art and beyound［J］.Acta Automatica Sinica,2017,43(3):321-332.(in Chinese)
［12］	Makhzani A,Shlens J,Jaitly N,et al.Adversarial autoencoders［J］.arXiv:1511.05644,2015.
［13］	Chen X,Duay Y,Houthooft Ｒ,et al.Infogan:Interpretable representation learning by information maximizing generative adversarial nets［C］∥Proc of International Conference on Neural Information Processing Systems,2016:2172-2180.
［14］	Yu L,Zhang W,Wang J,et al.SeqGAN:Sequence generative adversarial nets with policy gradient［C］∥Proce of the 31st AAAI Conference on Artificial Intelligence,2017:2852-2858.
［15］	Mirza M,Osindero S.Conditional generative adversarial nets［J］.arXiv:1411.1784,2014.
［16］	Spurra A, Aksane E,Hilliges O.Guiding InfoGAN with semi-supervision［C］∥Proc of Joint European Conference on Machine Learning and Knowledge Discovery in Databases,2017:119-134.
［17］	Chen Wen-bing,Guan Zheng-xiong,Chen Yun-jie.Data augmentation method based on conditional generative adversarial net model［J］.Journal of Computer Applications,2018,38 (11):3305-3311.(in Chinese )
［18］	Tang Xian-lun,Du Yi-ming,Liu Yu-wei,et al.Image recognition with conditional deep convolutional generative adversarial networks［J］.Acta Automatica Sinica,2018,44(5):855-864.(in Chinese)
［19］	Arjovsky M,Chintala S,Bottou L. Wasserstein generative adversarial networks［C］∥Proc of the 34th International Conference on Machine Learning, 2017:214-223.
［20］	Cédric V, Ludger R. Optimal transport:Old and new［J］.Jahresbericht der Deutschen Mathematiker-Vereinigung,2009,111(2):18-21.
［21］	Zhang X, Zhao J B, LeCun Y. Character-level convolutional networks for text classification［C］∥Proc of Neural Information Processing Systems,2015:649-657.
［22］	Lin Y O, Lei H, Wu J,et al.An empirical study on sentiment classification of chinese review using word embedding［C］∥Proc of the 29th Pacific Asia Conference on Language,Information and Computation,2015:258-266.
［23］	Kingma D,Ba J.Adam:A method for stochastic optimization［J］.arXiv:1412.6980,2014.
［24］	Sinno J P, Ivor W T, James T,et al. Domain adaptation via transfer component analysis［J］.IEEE Transactions on Neural Networks,2011,22(2):199-210.
［25］	Chen X,Sun Y,Athiwaratkun B,et al.Adversarial deep averaging networks for cross-lingual sentiment classification［J］.arXiv:1606.01614v4,2016.
［26］	Carmen B, Rada M, Janyce W, et al.Multilingual subjectivity analysis using machine translation［C］∥Proc of the Conference on Empirical Methods in Natural Language Processing,2008:127-135.
［27］	Xu R C,Yang Y M. Cross-lingual distillation for text classification［C］∥Proc of the 55th Annual Meeting of the Association for Computational Linguistics,2017:1415-1425.
［28］	Chen X,Sun Y,Athiwaratkun B,et al.Adversarial deep averaging networks for cross-lingual sentiment classification［J］.Transactions of the Association for Computational Linguistics,2018,6:557-570.
	附中文参考文献：
［9］	栗雨晴,礼欣,韩煦,等.基于双语词典的微博多类情感分析方法［J］.电子学报,2016,44(9):2068-2073.
［11］	王坤峰,苟超,段艳杰,等.生成式对抗网络GAN的研究进展与展望［J］.自动化学报,2017,43(3):321-332.
［17］	陈文兵,管正雄,陈允杰.基于条件生成式对抗网络的数据增强方法［J］.计算机应用,2018,38（11）:3305-3311.
［18］	唐贤伦,杜一铭,刘雨微,等.基于条件深度卷积生成对抗网络的图像识别方法［J］.自动化学报,2018,44(5):855-864.