J4 ›› 2011, Vol. 33 ›› Issue (6): 154-158.
• 论文 • Previous Articles Next Articles
JIN Chunxia,ZHOU Haiyan
Received:
Revised:
Online:
Published:
Abstract:
Document clustering is an important research topic of natural language processing and is widely applicable in the areas such as information retrieval, web mining and digital libraries. Because the feature terms of different positions in the document are different for the article’s contribution, TCABPW (a text clustering algorithm based on position weighting) is proposed in this paper. We construct a new text vector by selecting Ltopweight text that reflects the topical subject of the document and it is used to realize text clustering by hierarchical clustering and the Kmeans method. The results show that without affecting the quality of text clustering, the algorithm can not only greatly reduce the high dimension of text clustering, but also can significantly increase the stability and purity of text clutering, and can also produce the clusering effect of good quality.
Key words: text clustering;text vector;feature selecting;position weighting;similarity between clusters
JIN Chunxia,ZHOU Haiyan. A Text Clustering Algorithm Based on Position Weighting[J]. J4, 2011, 33(6): 154-158.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://joces.nudt.edu.cn/EN/
http://joces.nudt.edu.cn/EN/Y2011/V33/I6/154