• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊
论文

A Genetic Algorithm forHighDimensional Data Clustering

Expand
  • (Department of Computer Science,Shantou University,Shantou 515063,China)

Received date: 2009-03-31

  Revised date: 2009-10-21

  Online published: 2010-07-28

Abstract

Clustering analysis is an important subject in data mining. In many real applications, the clustering data are usually high dimensional. For example, the document data and DNA microarray data generally have several hundreds or even a thousand dimensions. While in high dimensional space, the distributions of the data are usually sparse; it makes most of those traditional clustering algorithms which work well on lowdimensional data invalid for highdimensional data. To solve such a problem, a new highdimensional data clustering approach based on genetic algorithms is proposed in this paper. The search capability of genetic algorithms is exploited to find the effective feature subspaces for clustering. In order to study the characteristics of dimensions shown in clustering, the degree of features which contribute to subspace clustering is designed as fitness function in this paper. The experimental results on the artificial data set, reallife data set and the comparison experiment with the kmeans algorithm indicate the feasibility and efficiency of the proposed approach.

Cite this article

SUN Haojun,XIONG Langhuan . A Genetic Algorithm forHighDimensional Data Clustering[J]. Computer Engineering & Science, 2010 , 32(8) : 94 -97 . DOI: 10.3969/j.issn.1007130X.2010.

Outlines

/