• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2012, Vol. 34 ›› Issue (9): 135-142.

• 论文 • 上一篇    下一篇

混合属性数据集的基于近邻连接的两阶段聚类算法

陈新泉   

  1. (重庆三峡学院计算机科学与工程学院,重庆 404000)
  • 收稿日期:2012-04-13 修回日期:2012-06-19 出版日期:2012-09-25 发布日期:2012-09-25
  • 基金资助:

    重庆三峡学院科学研究项目计划资助项目(11ZZ-058)

A TwoPhase Clustering Algorithm Based on Near Neighbor Connection for Mixed Data Set

CHEN Xinquan   

  1. (School of Computer Science and Engineering,Chongqing Three Gorges University,Chongqing 404000,China)
  • Received:2012-04-13 Revised:2012-06-19 Online:2012-09-25 Published:2012-09-25

摘要:

面对混合属性数据集的数据预处理需求,本文在给出若干定义及相关性质之后,提出了一种基于近邻连接的两阶段聚类算法。为提高算法的时间效率,给出了算法改进的思路与技术。多个人工数据集和UCI标准数据集的仿真实验结果表明,对于一些具有明显聚类分布结构的数据集,该算法经常能取得比kmeans算法和AP算法更好的聚类精度,说明它具有一定的有效性。为进一步推广并在实际中发掘出该算法的应用价值,最后给出了几点研究展望。

关键词: 混合属性, 聚类特征, 初级聚类, 近邻图

Abstract:

In order to effectively preprocess some mixed data sets,this paper first gives some definitions and related properties,then presents a twophase clustering algorithm based on near neighbor connection.To improve the time efficiency of this algorithm,some improving ideas and techniques are described.Through the simulation experiments of some artificial data sets and UCI standard data sets,we can verify that this clustering algorithm can often obtain better clustering quality than the kmeans algorithm and the AP algorithm when facing to some data sets with apparent clusters.So we can say that this clustering algorithm has certain value. In the end,several research expectations are given to disinter and popularize this method.

Key words: mixed attributes;cluster feature;primary cluster;near neighbor graph