• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

A multi-label text classification
model based on centroid

LI Xiao-lin1,2,3,WANG Cheng1,2   

  1. (1.College of Communication and Information Engineering,
    Chongqing University of Posts and Telecommunications,Chongqing 400065;
    2.Research Center of New Telecommunication Technology Applications,
    Chongqing University of Posts and Telecommunications,Chongqing 400065;
    3.Chongqing Information Technology Designing Limited Company,Chongqing 400021,China)
  • Received:2019-09-02 Revised:2019-12-11 Online:2020-06-25 Published:2020-06-25

Abstract:

In order to solve the problem that the current multi-label classification algorithm has low classification accuracy and high computational complexity, a centroid-based multi-label model for text categorization, named Multi-label Gravitation Model (ML-GM), is proposed. In the training phase, a similarity interval by calculating the similarity between the document and the centroid of the class. In the test phase, multi-label classification is performed by comparing the similarity between the undefined document and the class centroid is within the similarity interval. The model solves the problem of high computational complexity and low classification accuracy by introducing a centroid classifier and a gravity model. The Yahoo dataset is used in the experiment, and the results show that ML-GM achieves supe- rior performance in terms of average accuracy, AUC, one-error and hamming loss.

 
 

 

Key words: text classification, centroid-based classifier, multi-label learning, gravitation model, similarity interval