• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 • 上一篇    下一篇

基于离散量优化初始聚类中心的k-means算法

刘美玲1,2,黄名选3,汤卫东1   

  1. (1.广西民族大学信息科学与工程学院,广西 南宁 530006;
    2.广西师范学院科学计算与智能信息处理广西高校重点实验室,广西 南宁 530023;
    3.广西财经学院信息与统计学院,广西 南宁 530003)
     
  • 收稿日期:2015-10-08 修回日期:2016-02-24 出版日期:2017-06-25 发布日期:2017-06-25
  • 基金资助:

    国家自然科学基金(61262028);广西高校科学技术研究项目(KY2015ZD039);广西民族大学科研项目(2011MDYB032);科学计算与智能信息处理广西高校重点实验室基金项目(GXSCIIP201201)

A k-means algorithm for optimized initial
clustering center based on discrete quantity

LIU Mei-ling1,2,HUANG Ming-xuan3,TANG Wei-dong1   

  1. (1.College of Information Science and Engineering,Guangxi University for Nationalities,Nanning 530006;
    2.Guangxi Higher Education Key Laboratory of Science Computing and Intelligent Information Processing,
    Guangxi Teachers Education University,Nanning 530023;
    3.College of Information and Statistics,Guangxi University of Finance and Economics,Nanning 530003,China)
     
  • Received:2015-10-08 Revised:2016-02-24 Online:2017-06-25 Published:2017-06-25

摘要:

传统k-means算法由于初始聚类中心的选择是随机的,因此会使聚类结果不稳定。针对这个问题,提出一种基于离散量改进k-means初始聚类中心选择的算法。算法首先将所有对象作为一个大类,然后不断从对象数目最多的聚类中选择离散量最大与最小的两个对象作为初始聚类中心,再根据最近距离将这个大聚类中的其他对象划分到与之最近的初始聚类中,直到聚类个数等于指定的k值。最后将这k个聚类作为初始聚类应用到k-means算法中。将提出的算法与传统k-means算法、最大最小距离聚类算法应用到多个数据集进行实验。实验结果表明,改进后的k-means算法选取的初始聚类中心唯一,聚类过程的迭代次数也减少了,聚类结果稳定且准确率较高。

 

关键词: 离散量, k-means, 聚类, 聚类中心

Abstract:

The initial clustering centers of traditional k-means  are randomly selected, which results in unstable clustering results. To solve this problem, we propose an improved algorithm based on discrete quantity. In the proposed algorithm, all the objects are firstly regarded as a class and the two objects that have the maximum and the minimum discrete quantity respectively are selected from the cluster with the largest number of objects as the initial clustering centers. And then the other objects in the largest cluster are partitioned to the nearest initial clusters. The partition process is repeated until the cluster number is equal to the specified value k. Finally, as the initial clusters, the partitioned k  clusters are applied to the k-means  algorithm. We conduct experiments on several datasets, and compare the proposed algorithm with the traditional k-means  algorithm and max-min distance clustering algorithm. Experimental results show that the improved k-means algorithm can select unique initial clustering centers, reduce the times of iteration, and has stable clustering results and higher accuracy.

Key words: discrete quantity, k-means;clustering, clustering center