一类连续的K-means 等价聚类模型及其优化算法

计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (11): 2077-2083.

一类连续的K-means 等价聚类模型及其优化算法

谢挺1,刘瑞华2,魏正元1

（1.重庆理工大学理学院，重庆 400054； 2.重庆理工大学人工智能学院，重庆 400054）

收稿日期:2020-07-10 修回日期:2020-09-29 出版日期:2021-11-25 发布日期:2021-11-23
基金资助:
重庆市自然科学基金(cstc2019jcyj-msxmX0491);重庆市教委科技项目青年项目(KJQN201901145);重庆理工大学科研项目(2009ZD55)

A continuous K-means equivalent clustering model and its optimization algorithm

XIE Ting1,LIU Rui-hua2,WEI Zheng-yuan1

(1.School of Science,Chongqing University of Technology,Chongqing 400054;

2.School of Artificial Intelligence,Chongqing University of Technology,Chongqing 400054,China)

Received:2020-07-10 Revised:2020-09-29 Online:2021-11-25 Published:2021-11-23

摘要/Abstract

摘要： 聚类作为一种非监督学习方法是数据科学中重要的研究内容。K-means是一种基于划分的聚类算法，一般是利用启发式算法求解一个离散的NP问题。为增强K-means在大数据问题中的应用性，从聚类矩阵的属性出发，设计了一类非凸连续的K-means等价聚类优化模型，并利用ADMM 框架给出了该等价模型的快速优化算法。数值实验结果表明了该模型及其优化算法在大数据聚类中的准确性和高效性。此外，还讨论了该模型的性质及等价性问题。

关键词: K-means, 聚类, 稀疏, 交替方向乘子法

Abstract: As an unsupervised learning method, clustering is a significant research topic in data science. K-means is a partition-based clustering algorithm, which generally uses a heuristic algorithm to solve a discrete NP problem. In order to improve the application of K-means in big data problems, a continuous non-convex K-means equivalent clustering model is designed according to the properties of clustering matrix, and the fast optimization algorithm of this equivalent clustering medel is given by ADMM framework. Numerical experiments show that the model and algorithm are accurate and efficient in big data clustering. In addition, the feature and equivalence of the model are discussed.

Key words: K-means, clustering, sparse, alternating direction method of multipliers

谢挺, 刘瑞华, 魏正元. 一类连续的K-means 等价聚类模型及其优化算法[J]. 计算机工程与科学, 2021, 43(11): 2077-2083.

XIE Ting, LIU Rui-hua, WEI Zheng-yuan. A continuous K-means equivalent clustering model and its optimization algorithm[J]. Computer Engineering & Science, 2021, 43(11): 2077-2083.

[1]	彭林, 张鹏, 陈俊峰, 唐滔, 黄春. 基于监督学习的稀疏矩阵乘算法优选[J]. 计算机工程与科学, 2025, 47(03): 381-391.
[2]	谢斌, 李燕伟, 杨舒敏, 徐燕, 王冠超. 结合图像分解和自稀疏模糊聚类的情感颜色迁移[J]. 计算机工程与科学, 2025, 47(03): 513-523.
[3]	卢建云, 邵俊明. 基于多层次密度中心图的聚类算法[J]. 计算机工程与科学, 2025, 47(02): 327-335.
[4]	蔡发鹏, 冯骥, 杨德刚, 陈仲尚. 基于自然邻域图划分的层次聚类算法[J]. 计算机工程与科学, 2025, 47(02): 370-380.
[5]	刘合兵, 孔玉杰, 席磊, 尚俊平. 融合注意力机制的解耦对比聚类[J]. 计算机工程与科学, 2024, 46(12): 2261-2270.
[6]	张宗茂, 董德尊, 王子聪, 常俊胜, 张晓云, 王绍聪. 基于便笺式存储器的向量化SpMV算法的性能评估与分析[J]. 计算机工程与科学, 2024, 46(09): 1521-1528.
[7]	李猛, 刘姿邑, 宋宇航. 基于双重自表达与最大熵原理的深度子空间聚类算法[J]. 计算机工程与科学, 2024, 46(09): 1685-1692.
[8]	周智, 高建花, 计卫星. 基于FPGA和行折叠的稀疏矩阵向量乘优化[J]. 计算机工程与科学, 2024, 46(08): 1340-1348.
[9]	姜晶菲, 何源宏, 许金伟, 许诗瑶, 钱希福. NM-SpMM：面向国产异构向量处理器的半结构化稀疏矩阵乘算法[J]. 计算机工程与科学, 2024, 46(07): 1141-1150.
[10]	施禹, 董攀, 张利军. 一种不规则稀疏矩阵的SpMV方法[J]. 计算机工程与科学, 2024, 46(07): 1175-1184.
[11]	柴旭清, 乔一航, 范黎林, . 一种基于随机森林分类器构建高性能应用程序性能分析模型的方法[J]. 计算机工程与科学, 2024, 46(07): 1218-1228.
[12]	钟权, 陈志广, 高蓝光. EMRI-Tree：面向多分辨率可视化的层次式数据结构[J]. 计算机工程与科学, 2024, 46(05): 776-784.
[13]	王宇华, 何俊飞, 张宇琪, 徐悦竹, 崔环宇. DRM:基于迭代归并策略的GPU并行SpMV存储格式[J]. 计算机工程与科学, 2024, 46(03): 381-394.
[14]	宋鑫海, 韩京宇, 郎杭, 毛毅. 滑动窗口投票策略的QRS波群形态识别[J]. 计算机工程与科学, 2024, 46(02): 272-281.
[15]	钟卓辉, 陈黎飞, . 基于模型的非凸聚类算法[J]. 计算机工程与科学, 2024, 46(02): 292-302.