J4 ›› 2013, Vol. 35 ›› Issue (7): 149-155.
• 论文 • Previous Articles Next Articles
TURDI Tohti,AHMATJAN Ablat,MUYASSAR Aniwar,ASKAR Hamdulla
Received:
Revised:
Online:
Published:
Abstract:
The paper introduced the K-means method and the GAAC clustering method and the impact of two feature extraction methods on Uyghur text representation and clustering efficiency. Based on the largescale text corpus, both the K-means method and the GAAC clustering method were used to carry out Uyghur text clustering experiments and do performance comparative analysis. In view of the shortcoming that the K-means method is over dependent on the initial cluster centers and instable as well as the high computational complexity of the GAAC method, this paper proposed a Uyghur text clustering algorithm combining the GAAC and the K-means methods. The proposed algorithm has two steps. Firstly, the optimal initial cluster center is obtained from the small amount of text set by the GAAC method. Secondly, the large amount of text set is fast clustered by the K-means method. Experimental results show that the proposed algorithm has a significant increase on the clustering accuracy and the time complexity.
Key words: Uyghur text;text clustering;Kmeans;GAAC;combined algorithm
TURDI Tohti,AHMATJAN Ablat,MUYASSAR Aniwar,ASKAR Hamdulla. Combined algorithm of GAAC and K-means for Uyghur text clustering [J]. J4, 2013, 35(7): 149-155.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://joces.nudt.edu.cn/EN/
http://joces.nudt.edu.cn/EN/Y2013/V35/I7/149