• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊
论文

一种基于KD树子样的自动聚类方法

展开
  • (广东金融学院计算机科学与技术系,广东 广州 510521)
潘章明(1969),男,安徽芜湖人,硕士,讲师,研究方向为智能计算和模式识别。

收稿日期: 2010-02-26

  修回日期: 2010-05-30

  网络出版日期: 2011-01-25

An Automatic Clustering Method Using SubSampling for the KDTree

Expand
  • (Department of Computer Science and Technology,Guangdong University of Finance,Guangzhou 510521,China)

Received date: 2010-02-26

  Revised date: 2010-05-30

  Online published: 2011-01-25

摘要

基于进化算法的自动聚类方法具有搜索目标函数全局最优和自动发现聚类数的优点,同时也存在时间代价过高的缺陷。本文提出一种基于KD树子样的自动聚类方法,该方法使用KD树对样本空间进行分割,并在各子空间中随机取样形成KD树子样,然后在子样中自动聚类,最后运用KMeans在整个样本集中优化子样中的聚类结果。本文方法能够有效避免随机子样分布有偏的缺陷,即使比例很小的子样也能获得较好的聚类效果。仿真结果表明,本文方法能够保证聚类效果没有明显下降的情况下,显著缩短进化算法自动聚类的时间。

本文引用格式

潘章明 . 一种基于KD树子样的自动聚类方法[J]. 计算机工程与科学, 2011 , 33(1) : 166 -170 . DOI: 10.3969/j.issn.1007130X.2011.

Abstract

The evolution theory based automatic clustering method has advantages in finding the global optimum and the cluster number, but shows the lack of efficiency in machine time. An autoclustering method using the KDTree subsampling technique is proposed in this paper. The sample space is divided into subspaces using the KDTree method. In each subspace, the KDTree subsamples are produced by randomly sampling for later autoclustering. The KMeans method is used to optimize the cluster results of the subsamples. The method can effectively overcome the defect of biased distribution for random subsamples and give good cluster results even for small samples. The simulation results show that the method remarkably reduces the machine time for auto clustering without decreasing the clustering effect.

参考文献

[1]Bandyopadhyay  S,Maulik U. Genetic Clustering for Automatic Evolution of Clusters and Application to Image Classification[J]. Pattern Recognition, 2002,35(6):11971208.
[2]Omran M G H, Engelbrecht A P, Salman A. Dynamic Clustering Using Particle Swarm Optimization with Application in Unsupervised Image Classification[J]. Proc of World Academy of Science, Engineering and Technology, 2005, 9(11):199204.
[3]Abraham A, Das S,Roy S. Soft Computing for Knowledge Discovery and Data Mining[M]. Springer, 2007.
[4]Das S, Abraham A, Konar A. Automatic Clustering Using an Improved Differential Evolution Algorithm[J]. IEEE Trans on Systems, Man, and CyberneticsPart A: Systems and Humans, 2008, 38(1):218236.
[5]Bradley P S, Fayyad U M. Refining Initial Points for KMeans Clustering[C]∥Proc of the Fifteenth Int’l Conf on Machine Learning, 1998:9199.
[6]Rocke D M, Dai J. Sampling and Subsampling for Cluster Analysis in Data Mining[J]. Applications to Sky Survey Data, Data Mining and Knowledge Discovery,2003,7(2):215232.
[7]Tamminen M. Comment on Quad and Octrees[J]. Communications of the ACM,1984, 30(3):204212.
[8]仇明华, 殷丽华, 李斌. 基于多维二进制搜索树的异常检测技术[J]. 计算机工程与应用, 2007, 43(22):122125.
[9]范文山, 王斌. 启发式探查最佳分割平面的快速KDTree构建方法[J]. 计算机学报, 2009, 32(2):185192.
[10]Stephen R J,Henghan C. A Method for Initializing the Kmeans Clustering Algorithm Using Kdtrees[J]. Pattern Recognition Letters, 2007,28(8):965973.

文章导航

/