• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2012, Vol. 34 ›› Issue (6): 153-158.

• 论文 • 上一篇    下一篇

基于初始中心迭代收敛的文本聚类方法

刘金岭1,刘国香2,杨凤霞2   

  1. (1.淮阴工学院计算机工程学院,江苏 淮安 223003;2.沧州师范学院计算机系,河北 沧州 061001)
  • 收稿日期:2011-03-30 修回日期:2011-06-24 出版日期:2012-06-25 发布日期:2012-06-25
  • 基金资助:

    河北省科技支撑计划资助项目(10213581);淮安科技计划资助项目(HAG09061);淮阴工学院重点基金资助项目(HGA0907)

A Text Clustering Algorithm for Iteration Convergence  Based on Initial Centers

LIU Jinling1,LIU Guoxiang2,YANG Fengxia2   

  1. (1.School of Computer Engineering,Huaiyin Institute of Technology,Huai’an 223003;2.Department of Computer,Cangzhou Normal University,Cangzhou 061001,China)
  • Received:2011-03-30 Revised:2011-06-24 Online:2012-06-25 Published:2012-06-25

摘要:

利用两三次的KMeans迭代得到初始簇的中心,选择一组具有良好区分度的方向构建IMIC坐标系,在该坐标系下,构造出各坐标轴的重新标度函数用以提高聚类决策的有效性。算法IMIC经过多次迭代,最后收敛到最终解。IMIC算法的时间复杂度与KMeans保持在同一量级上。实验结果表明,IMIC算法有较好的聚类质量。

关键词: 迭代收敛, 文本, 聚类

Abstract:

Using KMeans iterative twice KMeans iterative to get an initial cluster center,in the text clustering process, a set of discriminative directions are chosen to construct the IMIC coordinate, each axis is constructed to rescaling function in order to improve the effectiveness of cluster policy,according to the distribution characteristics of the initial clusters. The IMIC iterative algorithm converges to the final solution.The time complexity of IMIC remains the same as KMeans by using a KMeanslike iteration strategy.The experimental results show that the IMIC algorithm has better clustering quality.

Key words: iteration convergence;text;clustering