• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2026, Vol. 48 ›› Issue (2): 330-340.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于数据增强的对话情绪识别

田宇,李军辉,朱苏阳,周国栋   

  1. (1.苏州大学计算机科学与技术学院,江苏 苏州 215006;2.苏州城市学院计算科学与人工智能学院,江苏 苏州 215104)

  • 收稿日期:2024-11-21 修回日期:2025-04-13 出版日期:2026-02-25 发布日期:2026-03-10

Data augmentation-based emotion recognition in conversation

TIAN Yu,LI Junhui,ZHU Suyang,ZHOU Guodong   

  1. (1.School of Computer Science & Technology,Soochow University,Suzhou 215006;
    2.Computing Science and  Artificial Intelligence College,Suzhou City University,Suzhou 215104,China)
  • Received:2024-11-21 Revised:2025-04-13 Online:2026-02-25 Published:2026-03-10

摘要: 对话情绪识别旨在对一段对话中的每句话进行情绪分类。然而,数据集中的标签分布通常存在显著的不平衡性,针对这一问题,采用基于数据增强的方法,提升模型在标签不平衡情况下的表现。具体而言,利用大语言模型的生成能力,通过回译、句子改写和对话生成3种方法扩充低频标签的数据,并根据余弦相似度和自我编辑距离的调和平均值挑选。在多个数据集上的实验结果表明,利用该方法训练后的模型有效改善了在数据标签不平衡情况下的性能,在加权F1值和少数类样本的识别上,相较于其他前沿模型有显著提升。


关键词: 数据增强, 对话情绪识别, 大语言模型

Abstract: Emotion recognition in conversation aims to classify the emotion of each utterance within a conversation. However, the label distribution in most datasets often exhibits significant imbalance. To address this issue, a data augmentation approach is proposed to enhance the model performance under conditions of label imbalance. Specifically, large language models are utilized to generate additional samples through back-translation, paraphrasing, and dialogue generation. The samples are filtered based on the harmonic mean of cosine similarity and self-Levenshtein distance. Experimental results on many  datasets show that this method improves model performance in imbalanced datasets, achieving gains in weighted-F1 scores and the recognition of minority labels compared to other state-of-the-art models.


Key words: data augmentation, emotion recognition in conversation, large language models