• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2013, Vol. 35 ›› Issue (2): 103-108.

• 论文 • Previous Articles     Next Articles

An efficient algorithm for Chinese text clustering

MA Jialin,LIU Jinling,YU Changhui   

  1. (School of Computer Engineering,Huaiyin Institute of Technology,Huai’an 223003,China)
  • Received:2012-01-10 Revised:2012-04-01 Online:2013-02-25 Published:2013-02-25

Abstract:

Text clustering algorithm faces the extremely sparse highdimensional vector problem, the traditional dimension reduction methods statistically extract text features by assuming that the key words are independent. They often ignore the text semantic relations in the context, leading to considerable loss of text semantics. In this paper, using “HowNet”, by computing the similarity of the semantic class, a weighted value of the lexical chain is constructed. Depending on the size of the weights, the two lexical chains with two largest weights are chosen to be composed of representative text keyword sequence. Then, a text clustering algorithm based on the theme of lexical chain (TCABTLC) is proposed. It can solve the issue that the text vector with high dimension and sparse leads to the operating efficiency of the clustering algorithm, and obtain better clustering results. The experiments show that, to maintain good accuracy, the time efficiency of the clustering algorithm has been greatly improved.

Key words: HowNet;vector model;lexical chain;text clustering