• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

A multi-stage classification KNN
algorithm based on center vector
 

LIU Shu-chang,ZHANG Zhong-lin     

  1. (School of Electronic and Information Engineering,Lanzhou Jiaotong University,Lanzhou 730070,China)
  • Received:2015-12-07 Revised:2016-02-22 Online:2017-09-25 Published:2017-09-25

Abstract:

The KNN algorithm has two disadvantages when classifying Chinese texts: uneven distribution of training samples and high computation overhead. We conduct in-depth research on the basis of existing improved algorithms, and propose a multi-stage classification KNN algorithm. Firstly, the algorithm adjusts training samples according to the density, thus the sample distribution tends to be in more ideal uniform state by the sample cutting technology, and calculate the class center vectors of each class. Secondly, on the premise of the accuracy of class center vectors, we bring forward the complex calculations at the classification stage to the classifier training process. Finally, the algorithm uses the appropriate value of m (primary category number) to identify text category according to the nearest neighbor. Experimental results show that the proposed algorithm can not only reduce computation complexity, but also significantly improve the speed of classification without deteriorating classification accuracy.

Key words: text classification, multi-stage classifier, class center vector, K-nearest neighbor