基于密度和中心指标的Canopy二分K-均值算法优化

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (02): 372-380.

• 人工智能与数据挖掘 • 上一篇

基于密度和中心指标的Canopy二分K-均值算法优化

沈郭鑫1，蒋中云2

（1.上海海洋大学信息学院，上海 201306；2.上海建桥学院信息技术学院，上海 201306）

收稿日期:2020-05-26 修回日期:2020-09-21 接受日期:2022-02-25 出版日期:2022-02-25 发布日期:2022-02-18
基金资助:
上海市属高校应用型本科试点专业基金（Z32004-17-84）

A Canopy bisecting K-Means algorithm based on density and central index

SHEN Guo-xin1，JIANG Zhong-yun2

（1.College of Information,Shanghai Ocean University,Shanghai 201306;

2.College of Information,Shanghai Jian Qiao University,Shanghai 201306,China）

Received:2020-05-26 Revised:2020-09-21 Accepted:2022-02-25 Online:2022-02-25 Published:2022-02-18

摘要/Abstract

摘要： 针对二分K-均值算法由于随机选取初始中心及人为定义聚类数而造成的聚类结果不稳定问题，提出了基于密度和中心指标的Canopy二分K-均值算法SDC_Bisecting K-Means。首先计算样本中数据密度及其邻域半径；然后选出密度最小的数据并结合Canopy算法的思想进行聚类，将得到的簇的个数及其中心作为二分K-均值算法的输入参数；最后在二分K-均值算法的基础上引入指数函数和中心指标对原始样本进行聚类。利用UCI数据集和自建数据集进行模拟实验对比，结果表明SDC_Bisecting K-Means不仅使得聚类结果更精确，同时算法的运行速度更快、稳定性更好。

关键词: 聚类, 二分K-均值算法, 密度, 邻域半径, 指数函数, 中心指标

Abstract: Aiming at the problem of unstable clustering results caused by the random selection of initial centers and artificially defining the number of clusters in the bisecting K-means algorithm, a Canopy bisecting K-means algorithm based on density and center index is proposed. Firstly, the algorithm calculates the data density in the sample and its neighborhood radius. Secondly, the data with the smallest density are selected and the ideas of the Canopy algorithm is combined for clustering. The number of clusters and cluster centers are obtained as the input parameters of the bisecting K-means algorithm. Finally, based on the bisecting K-means algorithm, the exponential function and central index are introduced to cluster the original samples. UCI data set and self-built data set were used to compare simulation experiments. The results show that the algorithm not only makes the clustering results more accurate and faster, but also has better stability.

Key words: clustering, bisecting K-Means algorithm, density, neighborhood radius, exponential function, central index

沈郭鑫, 蒋中云. 基于密度和中心指标的Canopy二分K-均值算法优化[J]. 计算机工程与科学, 2022, 44(02): 372-380.

SHEN Guo-xin, JIANG Zhong-yun. A Canopy bisecting K-Means algorithm based on density and central index[J]. Computer Engineering & Science, 2022, 44(02): 372-380.

编辑推荐

Metrics

阅读次数

全文

224

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	224

来源	本网站	其他网站

次数	194	30
比例	87%	13%

摘要

[1]	沈洁, 龙标, 黄春, 唐滔, 彭林. 面向向量部件的指数和对数函数优化方法[J]. 计算机工程与科学, 2025, 47(01): 18-26.
[2]	武培成, 赵旭俊, 靳黎忠. 基于网格密度积叠的流数据异常检测[J]. 计算机工程与科学, 2025, 47(01): 75-85.
[3]	赵佳彬, 徐慧英, 朱蓉, 陈滨, 王晓琳, 朱信忠. 基于多尺度特征融合与背景抑制的MFFBSNet人群计数算法[J]. 计算机工程与科学, 2024, 46(12): 2205-2214.
[4]	刘合兵, 孔玉杰, 席磊, 尚俊平. 融合注意力机制的解耦对比聚类[J]. 计算机工程与科学, 2024, 46(12): 2261-2270.
[5]	李猛, 刘姿邑, 宋宇航. 基于双重自表达与最大熵原理的深度子空间聚类算法[J]. 计算机工程与科学, 2024, 46(09): 1685-1692.
[6]	俞丁翠, 罗龙飞, 宋云鹏, 李文通, 石亮. 面向高密度闪存的内存页大小探索[J]. 计算机工程与科学, 2024, 46(07): 1167-1174.
[7]	柴旭清, 乔一航, 范黎林, . 一种基于随机森林分类器构建高性能应用程序性能分析模型的方法[J]. 计算机工程与科学, 2024, 46(07): 1218-1228.
[8]	于勤, 吴非, 张猛, 谢长生. 全息存储中的纠错码研究综述[J]. 计算机工程与科学, 2024, 46(04): 571-579.
[9]	宋鑫海, 韩京宇, 郎杭, 毛毅. 滑动窗口投票策略的QRS波群形态识别[J]. 计算机工程与科学, 2024, 46(02): 272-281.
[10]	钟卓辉, 陈黎飞, . 基于模型的非凸聚类算法[J]. 计算机工程与科学, 2024, 46(02): 292-302.
[11]	肖振国, 陈林书, 孙少杰, 梅本霞, 柳媛慧, 赵磊. 基于代数粒的聚类方法[J]. 计算机工程与科学, 2024, 46(01): 150-158.
[12]	王若宾, 耿芳东, 张永梅, 宋威, 王伟锋, 徐琳. 基于改进自适应DBSCAN的混合式MOOC视频观看模式挖掘[J]. 计算机工程与科学, 2023, 45(09): 1670-1678.
[13]	陈彪, 陈才, 张坤, 叶琴. FCBGA封装的CPU芯片散热性能影响因素研究[J]. 计算机工程与科学, 2023, 45(03): 406-410.
[14]	李超, 涂国庆, . 高密度LoRa网络优化方法研究[J]. 计算机工程与科学, 2023, 45(03): 426-433.
[15]	李帅, 常锦才, 李吕牧之, 蔡昆杰, . 基于差分隐私保护的Stacking集成聚类算法研究[J]. 计算机工程与科学, 2022, 44(08): 1402-1408.

基于密度和中心指标的Canopy二分K-均值算法优化

A Canopy bisecting K-Means algorithm based on density and central index

PDF

可视化

摘要/Abstract

引用本文

使用本文

相关文章 15

编辑推荐

Metrics

本文评价