• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (10): 1864-1874.

• 人工智能与数据挖掘 • 上一篇    下一篇

融合特征投影和负监督的文本分类

冯兴杰,曹若轩   

  1. (中国民航大学计算机科学与技术学院,天津 300300)

  • 收稿日期:2023-05-18 修回日期:2023-11-25 接受日期:2024-10-25 出版日期:2024-10-25 发布日期:2024-10-30
  • 基金资助:
    国家重点研发计划(2020YFB1600101);中央高校基本科研业务费(3122020052)

Text classification combining feature projection and negative supervision

FENG Xing-jie,CAO Ruo-xuan   

  1. (College of Computer Science and Technology,Civil Aviation University of China,Tianjin 300300,China)
  • Received:2023-05-18 Revised:2023-11-25 Accepted:2024-10-25 Online:2024-10-25 Published:2024-10-30

摘要: 用于分类的文本往往存在语义模糊、特征稀疏的问题,并且句中的某些词语含义会与文本真实标签所代表的语义不一致,这都会导致分类错误。针对上述问题,提出一种融合特征投影和负监督的多任务文本分类模型,主任务利用特征投影网络提取类别特征明显的纯化向量并进行分类;辅助任务给予模型负监督,以扩大不同类别文本的向量差别,消除个别词语的负面影响。此外,使用RoBERTa和BiLSTM同时对正、负样本进行特征提取,捕捉丰富的语义信息。在THUCNews新闻标题分类和微粒贷语义相似度分析数据集上进行了实验,结果表明本文模型相比现有模型具有更好的效果。

关键词: 文本分类, 特征投影, 负监督, 多任务模型, RoBERTa, BiLSTM

Abstract: Text used for classification often suffers from semantic ambiguity and sparse features, and the meaning of certain words in the sentence may not be consistent with the semantics represented by the actual label of the text, which can lead to classification errors. To address the above issues, a multi-task text classification model combining feature projection and negative supervision is proposed. The main task uses feature projection networks to extract purified vectors with obvious class features and perform classification. The auxiliary task gives the model negative supervision to expand the differences between different categories of text vectors and eliminate the negative impact of individual words. In addition, RoBERTa and BiLSTM are used to simultaneously extract features from positive and negative samples to capture rich semantic information. The model was tested on the THUCNews title classification and micro-loan semantic similarity analysis dataset, and the results show that the model has better performance than existing models.


Key words: text classification, feature projection, negative supervision, multi-task model, RoBERTa, BiLSTM