• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (06): 1092-1100.

• 人工智能与数据挖掘 • 上一篇    下一篇

Corrective-Net:面向多标签文本分类的标签关联学习模块

肖新正1,2,3,黄瑞章1,2,3,陈艳平1,2,3,秦永彬1,2,3,宋玉梅1,2,3,周裕林1,2,3


  

  1. (1.文本计算与认知智能教育部工程研究中心,贵州 贵阳 550025;
    2.公共大数据国家重点实验室,贵州 贵阳 550025;3.贵州大学计算机科学与技术学院,贵州 贵阳 550025)

  • 收稿日期:2023-09-01 修回日期:2023-10-30 接受日期:2024-06-25 出版日期:2024-06-25 发布日期:2024-06-18
  • 基金资助:
    国家自然科学基金(62066007,62066008);贵州省教育厅高等学校科学研究(青年项目)(黔教技〔2022〕149号)

Corrective-Net: A label association learning module for multi-label text classification

XIAO Xin-zheng1,2,3,HUANG Rui-zhang1,2,3,CHEN Yan-ping1,2,3,QIN Yong-bin1,2,3,SONG Yu-mei1,2,3,ZHOU Yu-lin1,2,3   

  1. (1.Text Computing & Cognitive Intelligence Engineering Research Center of National Education Ministry,Guiyang 550025;
    2.State Key Laboratory of Public Big Data,Guiyang 550025;
    3.College of Computer Science & Technology,Guizhou University,Guiyang 550025,China)
  • Received:2023-09-01 Revised:2023-10-30 Accepted:2024-06-25 Online:2024-06-25 Published:2024-06-18

摘要: 在目前的多标签文本分类任务中,主要面临以下2个问题:(1)侧重文本表示学习,对标签之间的关联信息建模不充分;(2)尽管使用了标签关联信息来改善多标签分类任务,但对标签关联的建模过于依赖人工预定义的外部知识,而外部知识的获取成本高昂,限制了其实际应用。针对以上问题,提出了一种面向多标签文本分类的标签关联学习模块Corrective-Net。该模块可以在不依赖外部知识的前提下,自动学习数据中的标签关联信息;同时,它还可以利用标签关联信息,对基础分类模块的初始预测结果进行修正,使得最终预测兼顾语义信息和标签关联信息,以获得更精准的多标签预测结果。在AAPD和SO数据集上的大量实验表明,Corrective-Net具有通用性和有效性,通过分析标签修正对各个标签性能的影响,得到了显式的标签关联信息,并进行了可视化。

关键词: 标签关联, 标签修正, 多标签, 文本分类, 可视化

Abstract: In the current multi-label text classification tasks, the following two problems are mainly faced: (1) Emphasis is placed on the learning of text representation, and the modeling of the association information between labels is insufficient; (2) Although label association information is used to improve multi-label classification tasks, its modeling of label association relies too much on manually predefined external knowledge, and the acquisition cost of external knowledge is high, which limits its practical application. To solve the above problems, this paper proposes a label association learning module for multi-label text classification, called Corrective-Net. The module can automatically learn label association information in data without relying on external knowledge. At the same time, it can also use label association information to modify the initial prediction of the basic classification module, so that the final prediction takes into account semantic information and label association information, so as to obtain more accurate multi-label prediction. A large number of experiments on AAPD and SO data sets show the universality and effectiveness of Corrective Net. The effects of corrective label corrections on the performance of each label are analyzed. Explicit label association information is obtained and visualized.

Key words: label association, label correction, multi-label, text classification, visualization