基于离散量优化初始聚类中心的k-means算法

计算机工程与科学

基于离散量优化初始聚类中心的k-means算法

刘美玲1,2，黄名选3，汤卫东1

(1.广西民族大学信息科学与工程学院，广西南宁 530006;

2.广西师范学院科学计算与智能信息处理广西高校重点实验室，广西南宁 530023;

3.广西财经学院信息与统计学院，广西南宁 530003)

收稿日期:2015-10-08 修回日期:2016-02-24 出版日期:2017-06-25 发布日期:2017-06-25
基金资助:
国家自然科学基金(61262028);广西高校科学技术研究项目(KY2015ZD039);广西民族大学科研项目(2011MDYB032);科学计算与智能信息处理广西高校重点实验室基金项目(GXSCIIP201201)

A k-means algorithm for optimized initial

clustering center based on discrete quantity

LIU Mei-ling1,2,HUANG Ming-xuan3,TANG Wei-dong1

(1.College of Information Science and Engineering,Guangxi University for Nationalities,Nanning 530006;

2.Guangxi Higher Education Key Laboratory of Science Computing and Intelligent Information Processing,

Guangxi Teachers Education University,Nanning 530023;

3.College of Information and Statistics,Guangxi University of Finance and Economics,Nanning 530003,China)

Received:2015-10-08 Revised:2016-02-24 Online:2017-06-25 Published:2017-06-25

摘要/Abstract

摘要：

传统k-means算法由于初始聚类中心的选择是随机的，因此会使聚类结果不稳定。针对这个问题，提出一种基于离散量改进k-means初始聚类中心选择的算法。算法首先将所有对象作为一个大类，然后不断从对象数目最多的聚类中选择离散量最大与最小的两个对象作为初始聚类中心，再根据最近距离将这个大聚类中的其他对象划分到与之最近的初始聚类中，直到聚类个数等于指定的k值。最后将这k个聚类作为初始聚类应用到k-means算法中。将提出的算法与传统k-means算法、最大最小距离聚类算法应用到多个数据集进行实验。实验结果表明，改进后的k-means算法选取的初始聚类中心唯一，聚类过程的迭代次数也减少了，聚类结果稳定且准确率较高。

关键词: 离散量, k-means, 聚类, 聚类中心

Abstract:

The initial clustering centers of traditional k-means are randomly selected, which results in unstable clustering results. To solve this problem, we propose an improved algorithm based on discrete quantity. In the proposed algorithm, all the objects are firstly regarded as a class and the two objects that have the maximum and the minimum discrete quantity respectively are selected from the cluster with the largest number of objects as the initial clustering centers. And then the other objects in the largest cluster are partitioned to the nearest initial clusters. The partition process is repeated until the cluster number is equal to the specified value k. Finally, as the initial clusters, the partitioned k clusters are applied to the k-means algorithm. We conduct experiments on several datasets, and compare the proposed algorithm with the traditional k-means algorithm and max-min distance clustering algorithm. Experimental results show that the improved k-means algorithm can select unique initial clustering centers, reduce the times of iteration, and has stable clustering results and higher accuracy.

Key words: discrete quantity, k-means;clustering, clustering center

刘美玲1,2，黄名选3，汤卫东1. 基于离散量优化初始聚类中心的k-means算法[J]. 计算机工程与科学.

LIU Mei-ling1,2,HUANG Ming-xuan3,TANG Wei-dong1.

A k-means algorithm for optimized initial

clustering center based on discrete quantity

[J]. Computer Engineering & Science.

[1]	刘合兵, 孔玉杰, 席磊, 尚俊平. 融合注意力机制的解耦对比聚类[J]. 计算机工程与科学, 2024, 46(12): 2261-2270.
[2]	李猛, 刘姿邑, 宋宇航. 基于双重自表达与最大熵原理的深度子空间聚类算法[J]. 计算机工程与科学, 2024, 46(09): 1685-1692.
[3]	柴旭清, 乔一航, 范黎林, . 一种基于随机森林分类器构建高性能应用程序性能分析模型的方法[J]. 计算机工程与科学, 2024, 46(07): 1218-1228.
[4]	宋鑫海, 韩京宇, 郎杭, 毛毅. 滑动窗口投票策略的QRS波群形态识别[J]. 计算机工程与科学, 2024, 46(02): 272-281.
[5]	钟卓辉, 陈黎飞, . 基于模型的非凸聚类算法[J]. 计算机工程与科学, 2024, 46(02): 292-302.
[6]	肖振国, 陈林书, 孙少杰, 梅本霞, 柳媛慧, 赵磊. 基于代数粒的聚类方法[J]. 计算机工程与科学, 2024, 46(01): 150-158.
[7]	王若宾, 耿芳东, 张永梅, 宋威, 王伟锋, 徐琳. 基于改进自适应DBSCAN的混合式MOOC视频观看模式挖掘[J]. 计算机工程与科学, 2023, 45(09): 1670-1678.
[8]	刘浩翰, 孙铖, 贺怀清, 惠康华. 基于改进YOLOv3的金属表面缺陷检测[J]. 计算机工程与科学, 2023, 45(07): 1226-1235.
[9]	李帅, 常锦才, 李吕牧之, 蔡昆杰, . 基于差分隐私保护的Stacking集成聚类算法研究[J]. 计算机工程与科学, 2022, 44(08): 1402-1408.
[10]	李兰, 刘杰, 张洁. 基于YOLOv4改进算法的复杂行人检测模型研究[J]. 计算机工程与科学, 2022, 44(08): 1449-1456.
[11]	陈奉贤. 基于NR-Transformer的集群作业运行时间预测[J]. 计算机工程与科学, 2022, 44(07): 1181-1190.
[12]	庞兴龙, 朱国胜, 杨少龙, 李修远. 一种基于聚类与噪声的网络流量分类方法[J]. 计算机工程与科学, 2022, 44(07): 1207-1215.
[13]	黄志强, 李军, 张世义. 基于轻量级神经网络的目标检测研究[J]. 计算机工程与科学, 2022, 44(07): 1265-1272.
[14]	刘榕, 伍欣, 敖斌, 文青, 李宽. 用于CD56图像分割的细胞标注精细化与自适应加权损失[J]. 计算机工程与科学, 2022, 44(05): 870-878.
[15]	刘云, 肖添, 王梓宇. 动态特征选择算法对恶意行为检测的优化研究[J]. 计算机工程与科学, 2022, 44(04): 665-673.