An efficient algorithm for Chinese text clustering

J4 ›› 2013, Vol. 35 ›› Issue (2): 103-108.

• 论文 • Previous Articles Next Articles

An efficient algorithm for Chinese text clustering

MA Jialin,LIU Jinling,YU Changhui

(School of Computer Engineering,Huaiyin Institute of Technology,Huai’an 223003,China)

Received:2012-01-10 Revised:2012-04-01 Online:2013-02-25 Published:2013-02-25

Abstract

Abstract:

Text clustering algorithm faces the extremely sparse highdimensional vector problem, the traditional dimension reduction methods statistically extract text features by assuming that the key words are independent. They often ignore the text semantic relations in the context, leading to considerable loss of text semantics. In this paper, using “HowNet”, by computing the similarity of the semantic class, a weighted value of the lexical chain is constructed. Depending on the size of the weights, the two lexical chains with two largest weights are chosen to be composed of representative text keyword sequence. Then, a text clustering algorithm based on the theme of lexical chain (TCABTLC) is proposed. It can solve the issue that the text vector with high dimension and sparse leads to the operating efficiency of the clustering algorithm, and obtain better clustering results. The experiments show that, to maintain good accuracy, the time efficiency of the clustering algorithm has been greatly improved.

Key words: HowNet;vector model;lexical chain;text clustering

MA Jialin,LIU Jinling,YU Changhui. An efficient algorithm for Chinese text clustering[J]. J4, 2013, 35(2): 103-108.

An efficient algorithm for Chinese text clustering

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 0

Recommended Articles

Metrics

Comments