• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2011, Vol. 33 ›› Issue (6): 138-143.

• 论文 • 上一篇    下一篇

一种改进的半监督K-Means聚类算法

袁利永,王基一   

  1. (浙江师范大学数理与信息工程学院,浙江 金华 321004)
  • 收稿日期:2010-07-15 修回日期:2010-12-08 出版日期:2011-06-25 发布日期:2011-06-25
  • 作者简介:袁利永(1978),男,浙江嵊州人,讲师,研究方向为进化计算和机器学习。王基一(1953),男,浙江宁波人,教授,研究方向为人工智能和机器学习。
  • 基金资助:

    2010年度浙江省教育厅项目(Y201016493)

An Improved SemiSupervised  K-Means Clustering Algorithm

YUAN Liyong,WANG Jiyi   

  1. (School of Information Science and Engineering,Zhejiang Normal University,Jinhua 321004,China)
  • Received:2010-07-15 Revised:2010-12-08 Online:2011-06-25 Published:2011-06-25

摘要:

半监督聚类利用部分标签的数据辅助未标签的数据进行学习,从而提高聚类的性能。针对基于Kmeans的聚类算法发现非球状簇能力差的问题,本文提出新的处理思想,即把已标签数据对未标签数据的引力影响加入到类别分配决策中,给出了类与点的引力影响度定义,设计了带引力参数的半监督Kmeans聚类算法。实验表明,该算法在处理非球状簇分布的聚类时比现有的半监督Kmeans方法效果更好。

关键词: 半监督聚类, constrainedKmeans, 标记数据, 引力影响, 非球状簇

Abstract:

Semisupervised clustering employs a small amount of labeled data to aid unsupervised learning. For the poor ability of the clustering algorithm based on the K-means for nonspherical clusters problems, this paper presents a new idea that considers the influence of the labeled datapoints on the unlabeled datapoints in allocating category, puts forward a definition of gravitational influence degree between category and datapoint,and designs a semisupervised K-means clustering algorithm with a gravitational parameter.The experiments show that the new algorithm has better effect than the traditional semisupervised K-means clustering method in dealing with the distribution of nonspherical cluster clustering.

Key words: semisupervised clustering;constrainedKmeans;labeled data, gravitational influence;nonspherical cluster