A Genetic Algorithm forHighDimensional Data Clustering

SUN Haojun,XIONG Langhuan

doi:10.3969/j.issn.1007130X.2010.

Computer Engineering & Science >

2010 , Vol. 32 >Issue 8: 94 - 97

DOI: https://doi.org/10.3969/j.issn.1007130X.2010.

论文

A Genetic Algorithm forHighDimensional Data Clustering

Expand

(Department of Computer Science,Shantou University,Shantou 515063,China)

Received date: 2009-03-31

Revised date: 2009-10-21

Online published: 2010-07-28

Fold

Abstract

Clustering analysis is an important subject in data mining. In many real applications, the clustering data are usually high dimensional. For example, the document data and DNA microarray data generally have several hundreds or even a thousand dimensions. While in high dimensional space, the distributions of the data are usually sparse; it makes most of those traditional clustering algorithms which work well on lowdimensional data invalid for highdimensional data. To solve such a problem, a new highdimensional data clustering approach based on genetic algorithms is proposed in this paper. The search capability of genetic algorithms is exploited to find the effective feature subspaces for clustering. In order to study the characteristics of dimensions shown in clustering, the degree of features which contribute to subspace clustering is designed as fitness function in this paper. The experimental results on the artificial data set, reallife data set and the comparison experiment with the kmeans algorithm indicate the feasibility and efficiency of the proposed approach.

Key words： highdimensional data clustering；genetic algorithm；feature subspace

Cite this article

SUN Haojun,XIONG Langhuan . A Genetic Algorithm forHighDimensional Data Clustering[J]. Computer Engineering & Science, 2010 , 32(8) : 94 -97 . DOI: 10.3969/j.issn.1007130X.2010.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article