• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2013, Vol. 35 ›› Issue (7): 164-168.

• 论文 • Previous Articles     Next Articles

Topictext clustering algorithm based on word co-occurrence           

BAI Qiuchan1,JIN Chunxia2,ZHANG Hui2,ZHOU Haiyan2   

  1. (1.School of Electronic and Electrical Engineering,Huaiyin Institute of Technology,Huai’an 223003;
    (2.School of Computer Engineering,Huaiyin Institute of Technology,Huai’an 223003,China)
  • Received:2012-04-09 Revised:2012-08-13 Online:2013-07-25 Published:2013-07-25

Abstract:

Text topic is the key of text clustering, the cooccurrence words are very strong to express document theme in document. On the basis of studying the existing text subject mining and the extraction algorithm of word cooccurrence, this paper proposed a topic text clustering algorithm based on association rules and word cooccurrence. Firstly the algorithm extracts cooccurrence words of document by association rule mining algorithm. Secondly, according to the cooccurrence word, the similarity measure of cooccurrence word pairs was implemented. Finally it uses the hierarchical clustering algorithm to finish the document clustering. Experimental results show that the hierarchical clustering algorithm based on word cooccurrence can not only greatly reduce high dimension of text vector and the algorithm complexity, but also significantly improves the efficiency and accuracy of text clustering, in comparison to other algorithms, and it can also produce the clustering effect of good quality.

Key words: word co-occurrence;relation rules;data mining;hierarchical clustering