• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2012, Vol. 34 ›› Issue (9): 135-142.

• 论文 • Previous Articles     Next Articles

A TwoPhase Clustering Algorithm Based on Near Neighbor Connection for Mixed Data Set

CHEN Xinquan   

  1. (School of Computer Science and Engineering,Chongqing Three Gorges University,Chongqing 404000,China)
  • Received:2012-04-13 Revised:2012-06-19 Online:2012-09-25 Published:2012-09-25

Abstract:

In order to effectively preprocess some mixed data sets,this paper first gives some definitions and related properties,then presents a twophase clustering algorithm based on near neighbor connection.To improve the time efficiency of this algorithm,some improving ideas and techniques are described.Through the simulation experiments of some artificial data sets and UCI standard data sets,we can verify that this clustering algorithm can often obtain better clustering quality than the kmeans algorithm and the AP algorithm when facing to some data sets with apparent clusters.So we can say that this clustering algorithm has certain value. In the end,several research expectations are given to disinter and popularize this method.

Key words: mixed attributes;cluster feature;primary cluster;near neighbor graph