Computer Engineering & Science >
A Genetic Algorithm forHighDimensional Data Clustering
Received date: 2009-03-31
Revised date: 2009-10-21
Online published: 2010-07-28
Clustering analysis is an important subject in data mining. In many real applications, the clustering data are usually high dimensional. For example, the document data and DNA microarray data generally have several hundreds or even a thousand dimensions. While in high dimensional space, the distributions of the data are usually sparse; it makes most of those traditional clustering algorithms which work well on lowdimensional data invalid for highdimensional data. To solve such a problem, a new highdimensional data clustering approach based on genetic algorithms is proposed in this paper. The search capability of genetic algorithms is exploited to find the effective feature subspaces for clustering. In order to study the characteristics of dimensions shown in clustering, the degree of features which contribute to subspace clustering is designed as fitness function in this paper. The experimental results on the artificial data set, reallife data set and the comparison experiment with the kmeans algorithm indicate the feasibility and efficiency of the proposed approach.
SUN Haojun,XIONG Langhuan . A Genetic Algorithm forHighDimensional Data Clustering[J]. Computer Engineering & Science, 2010 , 32(8) : 94 -97 . DOI: 10.3969/j.issn.1007130X.2010.
/
| 〈 |
|
〉 |