• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊
论文

An Automatic Clustering Method Using SubSampling for the KDTree

Expand
  • (Department of Computer Science and Technology,Guangdong University of Finance,Guangzhou 510521,China)

Received date: 2010-02-26

  Revised date: 2010-05-30

  Online published: 2011-01-25

Abstract

The evolution theory based automatic clustering method has advantages in finding the global optimum and the cluster number, but shows the lack of efficiency in machine time. An autoclustering method using the KDTree subsampling technique is proposed in this paper. The sample space is divided into subspaces using the KDTree method. In each subspace, the KDTree subsamples are produced by randomly sampling for later autoclustering. The KMeans method is used to optimize the cluster results of the subsamples. The method can effectively overcome the defect of biased distribution for random subsamples and give good cluster results even for small samples. The simulation results show that the method remarkably reduces the machine time for auto clustering without decreasing the clustering effect.

Cite this article

PAN Zhangming . An Automatic Clustering Method Using SubSampling for the KDTree[J]. Computer Engineering & Science, 2011 , 33(1) : 166 -170 . DOI: 10.3969/j.issn.1007130X.2011.

References

[1]Bandyopadhyay  S,Maulik U. Genetic Clustering for Automatic Evolution of Clusters and Application to Image Classification[J]. Pattern Recognition, 2002,35(6):11971208.
[2]Omran M G H, Engelbrecht A P, Salman A. Dynamic Clustering Using Particle Swarm Optimization with Application in Unsupervised Image Classification[J]. Proc of World Academy of Science, Engineering and Technology, 2005, 9(11):199204.
[3]Abraham A, Das S,Roy S. Soft Computing for Knowledge Discovery and Data Mining[M]. Springer, 2007.
[4]Das S, Abraham A, Konar A. Automatic Clustering Using an Improved Differential Evolution Algorithm[J]. IEEE Trans on Systems, Man, and CyberneticsPart A: Systems and Humans, 2008, 38(1):218236.
[5]Bradley P S, Fayyad U M. Refining Initial Points for KMeans Clustering[C]∥Proc of the Fifteenth Int’l Conf on Machine Learning, 1998:9199.
[6]Rocke D M, Dai J. Sampling and Subsampling for Cluster Analysis in Data Mining[J]. Applications to Sky Survey Data, Data Mining and Knowledge Discovery,2003,7(2):215232.
[7]Tamminen M. Comment on Quad and Octrees[J]. Communications of the ACM,1984, 30(3):204212.
[8]仇明华, 殷丽华, 李斌. 基于多维二进制搜索树的异常检测技术[J]. 计算机工程与应用, 2007, 43(22):122125.
[9]范文山, 王斌. 启发式探查最佳分割平面的快速KDTree构建方法[J]. 计算机学报, 2009, 32(2):185192.
[10]Stephen R J,Henghan C. A Method for Initializing the Kmeans Clustering Algorithm Using Kdtrees[J]. Pattern Recognition Letters, 2007,28(8):965973.

Outlines

/