• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2011, Vol. 33 ›› Issue (12): 110-115.

• 论文 • 上一篇    下一篇

基于新的相似性度量的加权粗糙聚类算法

孙晓博,廖桂平   

  1. (湖南农业大学信息科学技术学院,湖南 长沙 410128)
  • 收稿日期:2011-02-20 修回日期:2011-05-28 出版日期:2011-12-24 发布日期:2011-12-25

A Weighted Rough Clustering Algorithm Based on New Similarity Measure

SUN Xiaobo,LIAO Guiping   

  1. (School of Information Science and Technology,Hunan Agricultural University,Changsha 410128,China)
  • Received:2011-02-20 Revised:2011-05-28 Online:2011-12-24 Published:2011-12-25

摘要:

聚类是数据挖掘中重要的研究方向。本文针对现有的聚类算法中相似度量的缺陷,提出了一种新的相似性度量方法。在此基础上,将粗糙集理论中的区分能力引入到聚类算法中,用来度量属性的重要性,进而提出了一种能够处理符号型数据的新的加权粗糙聚类算法。通过对UCI数据的实验表明,本文算法对数据输入顺序不敏感,且不需要预先给定簇的数目,提高了聚类的质量。

关键词: 聚类分析, 粗糙集, 相似度, 数据挖掘

Abstract:

Clustering is a major research orientation in data mining.Considering the drawbacks of the existing clustering algorithm, a new similarity measure is proposed firstly. Then the discernibility ability of the rough set theory is used to measure the importance of attributes, and thus a weighted rough clustering algorithm based on new similarity measure is proposed. Finally,we test our algorithm versus other algorithms on the UCI datasets, and the experimental results show the proposed clustering algorithm can deal with the categorical data, and does not need to be given the number of cluster, and especially, it improves the cluster quality.

Key words: clustering analysis;rough set;similarity;data mining