• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 • 上一篇    下一篇

基于贝叶斯距离的K-modes聚类算法

赵亮1,刘建辉2,张昭昭2   

  1. (1.辽宁工程技术大学研究生院,辽宁 阜新 123000;2.辽宁工程技术大学电子与信息工程学院,辽宁 葫芦岛 125000)
  • 收稿日期:2015-06-05 修回日期:2015-11-27 出版日期:2017-01-25 发布日期:2017-01-25
  • 基金资助:

    国家自然基金(61440059);辽宁省自然基金(LS2013129)

 A K-modes clustering algorithm
based on Bayes distance measure
 

ZHAO Liang1,LIU Jianhui2,ZHANG Zhaozhao2   

  1. (1.Institute of Graduate,Liaoning Technical University,Fuxin 123000;
    2.School of Electronic and Information Engineering,Liaoning Technical University,Huludao 125000,China)
  • Received:2015-06-05 Revised:2015-11-27 Online:2017-01-25 Published:2017-01-25

摘要:

K-modes算法中原有的分类变量间距离度量方法无法体现属性值之间差异,对此提出了一种基于朴素贝叶斯分类器中间运算结果的距离度量。该度量构建代表分类变量的特征向量并计算向量间的欧氏距离作为变量间的距离。将提出的距离度量代入Kmodes聚类算法并在多个UCI公共数据集上与其他度量方法进行比较,实验结果表明该距离度量更加有效。

关键词: K-modes聚类算法, 分类变量, 朴素贝叶斯分类器, 距离度量

Abstract:

The original distance measure of Kmodes clustering algorithm cannot reflect the difference between categorical variables. To overcome this drawback, we propose a new distance measure algorithm based on the intermediate result of Nave Bayes classifier. This algorithm constructs feature vectors to present categorical variables and uses the Euclidean distance of the feature vectors as distance between variables. We implement the Kmodes algorithm with the new derived measure and the experiments on extensive UCI data sets show that the proposal is more effective in comparison with other measure algorithms.

Key words: K-modes clustering algorithm, categorical variables, Nave Bayes classifier, distance measure