• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2011, Vol. 33 ›› Issue (1): 166-170.doi: 10.3969/j.issn.1007130X.2011.

• 论文 • 上一篇    下一篇

一种基于KD树子样的自动聚类方法

潘章明   

  1. (广东金融学院计算机科学与技术系,广东 广州 510521)
  • 收稿日期:2010-02-26 修回日期:2010-05-30 出版日期:2011-01-25 发布日期:2011-01-25
  • 通讯作者: 潘章明 E-mail:panzhangming@163.com
  • 作者简介:潘章明(1969),男,安徽芜湖人,硕士,讲师,研究方向为智能计算和模式识别。

An Automatic Clustering Method Using SubSampling for the KDTree

PAN Zhangming   

  1. (Department of Computer Science and Technology,Guangdong University of Finance,Guangzhou 510521,China)
  • Received:2010-02-26 Revised:2010-05-30 Online:2011-01-25 Published:2011-01-25

摘要:

基于进化算法的自动聚类方法具有搜索目标函数全局最优和自动发现聚类数的优点,同时也存在时间代价过高的缺陷。本文提出一种基于KD树子样的自动聚类方法,该方法使用KD树对样本空间进行分割,并在各子空间中随机取样形成KD树子样,然后在子样中自动聚类,最后运用KMeans在整个样本集中优化子样中的聚类结果。本文方法能够有效避免随机子样分布有偏的缺陷,即使比例很小的子样也能获得较好的聚类效果。仿真结果表明,本文方法能够保证聚类效果没有明显下降的情况下,显著缩短进化算法自动聚类的时间。

关键词: KD树, 子样, 差分进化, 自动聚类

Abstract:

The evolution theory based automatic clustering method has advantages in finding the global optimum and the cluster number, but shows the lack of efficiency in machine time. An autoclustering method using the KDTree subsampling technique is proposed in this paper. The sample space is divided into subspaces using the KDTree method. In each subspace, the KDTree subsamples are produced by randomly sampling for later autoclustering. The KMeans method is used to optimize the cluster results of the subsamples. The method can effectively overcome the defect of biased distribution for random subsamples and give good cluster results even for small samples. The simulation results show that the method remarkably reduces the machine time for auto clustering without decreasing the clustering effect.

Key words: KDtree;subsample;differential evolution;automatic clustering