• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

An improved KNN short text classification
algorithm based on category feature words
 

HUANG Xian-ying,XIONG Li-yuan,LIU Ying-tao,LI Qin-dong   

  1. (College of Computer Science and Engineering,Chongqing University of Technology,Chongqing 400054,China)
  • Received:2016-05-04 Revised:2016-06-23 Online:2018-01-25 Published:2018-01-25

Abstract:

The KNN classification algorithm improves the accuracy of short text classification by enlarging the content of short text. However, it leads to the decrease of classification efficiency on short text. Given this problem, we extract the category feature words in the categories of the training set by the CHI. According to the similarities between the samples of every classification and their features in the training set, the existing training set is split and refined. In this way, every classification of the training set can be split into many training subsets containing part of the samples. Then, according to the test text, the corresponding samples of the training subsets which are more similar to the test text are extracted to reconstruct the training sets of the test text. By decreasing the number of comparative text pairs in the KNN short text classification algorithm, the efficiency of the KNN short text classification algorithm can be increased. Experimental results show that comparing with the KNN short text classification algorithm based on HowNet, the efficiency of short text classification of the proposed algorithm can be increased by about 50 percent and the classification accuracy is also improved to some extent.
Key words:

Key words: short text classification;KNN classification;category feature;hownet;efficiency