一种基于MPI的稀疏化局部尺度并行谱聚类算法的研究与实现

J4 ›› 2016, Vol. 38 ›› Issue (05): 839-847.

• 论文 • 下一篇

一种基于MPI的稀疏化局部尺度并行谱聚类算法的研究与实现

李瑞琳1,2，赵永华1，黄小磊2,3

(1.中国科学院计算机网络信息中心高性能计算部，北京 100190；
2.中国科学院大学,北京 100190；3.中国科学院计算机网络信息中心，北京 100190)

收稿日期:2015-12-14 修回日期:2016-02-17 出版日期:2016-05-25 发布日期:2016-05-25
基金资助:
数学工程与先进计算国家重点实验室开放基金（2014A03 ）

A sparse local scaling parallel
spectral clustering algorithm based on MPI

LI Ruilin1,2 ,ZHAO Yonghua1,HUANG Xiaolei2,3

(1.The Department of High Performance Computing,Computer Network Information Center,
Chinese Academy of Sciences,Beijing 100190;
2.University of Chinese Academy of Sciences,Beijing 100190;
3.Computer Network Information Center,Chinese Academy of Sciences,Beijing 100190,China)

Received:2015-12-14 Revised:2016-02-17 Online:2016-05-25 Published:2016-05-25

摘要/Abstract

摘要：

谱聚类算法由于其可识别非凸数据分布、可有效避免局部最优解、不受数据点维数限制等优点，在许多领域得到广泛应用。然而，随着数据量的增大和数据维数的增多，在保证聚类准确性的前提下，尽可能降低计算时间将变得非常必要。此外，影响谱聚类算法聚类质量的因素除数据集本身外，还与所采用的求解距离矩阵的方法、相似性矩阵的尺度参数、Laplacian矩阵形式等多种因素相关。针对以上问题，首先对于大规模数据问题，将MPI并行编程模型应用于谱聚类算法；然后利用t最近邻方法对谱聚类算法中较大维数的Laplacian矩阵进行近似转化，同时用局部尺度（Local Scaling）参数对算法中的尺度参数进行自动调节。基于上述分析，提出了一种谱聚类并行实现算法，即稀疏化局部尺度并行谱聚类算法SLSPSC，并在四个数据集上进行了测试，与现有的并行谱聚类算法PSC在运行时间和聚类质量两方面做了比较分析。实验结果显示，该算法降低了求解Laplacian矩阵的总时间，同时部分数据集聚类质量得到较大提高。

关键词: 并行谱聚类, 稀疏化, 局部尺度, MPI

Abstract:

The spectral clustering algorithm is widely used in many fields because of its advantages of identifying the nonconvex data distribution, and effectively avoiding the local optimal solution without the dimension limitation of data points. However, with the growth of the amount and dimension of the data, it is very necessary to reduce the algorithm’s computation time on the premise of guaranteeing the clustering accuracy. Moreover, besides the data set itself, the factors affecting the clustering quality of the spectral clustering algorithm include the method of solving distance matrix, the scale parameters of similarity matrix and the form of Laplacian matrix. Aiming at the problems mentioned above, we apply the message passing interface (MPI) parallel programming model to the spectral clustering algorithm. The tnearest neighbor method is then used in the transformation of Laplacian matrix approximation in the spectral clustering algorithm. Meanwhile, we select the local scaling parameter as the selftuning scaling parameter in the algorithm. Based on the above analysis, we propose a parallel implementation of the sparse local scaling parallel spectral clustering (SLSPSC), then conduct experiments on four different data sets, and analyze and compare the results with those of the current parallel spectral clustering (PSC) in running time and clustering quality. Experimental results show that the total computation time of the SLSPSC is greatly reduced when calculating the Laplacian matrix, and the quality of some data sets is improved.

Key words: parallel spectral clustering;sparsification;local scaling;MPI

李瑞琳1,2，赵永华1，黄小磊2,3. 一种基于MPI的稀疏化局部尺度并行谱聚类算法的研究与实现[J]. J4, 2016, 38(05): 839-847.

LI Ruilin1,2 ,ZHAO Yonghua1,HUANG Xiaolei2,3. A sparse local scaling parallel
spectral clustering algorithm based on MPI [J]. J4, 2016, 38(05): 839-847.

编辑推荐

Metrics

阅读次数

全文

225

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	225

来源	本网站	其他网站

次数	155	70
比例	69%	31%

摘要

最新录用	在线预览	正式出版

0	0	96

	来源	本网站

	次数	96
	比例	100%

[1]	钟权, 陈志广, 高蓝光. EMRI-Tree：面向多分辨率可视化的层次式数据结构[J]. 计算机工程与科学, 2024, 46(05): 776-784.
[2]	孙浩男, 王飞, 魏迪, 尹万旺, 史俊达 . 一种面向大规模并发的Gatherv优化方法[J]. 计算机工程与科学, 2022, 44(09): 1542-1549.
[3]	葛旭冉, 刘洋, 陈志广, 肖侬. 基于MPI的并行大数据集生成器[J]. 计算机工程与科学, 2022, 44(07): 1152-1161.
[4]	范培勤, 过武宏, 韩梅, 唐帅, 张驰, . 水声环境特征参数并行预报方法研究[J]. 计算机工程与科学, 2021, 43(11): 1920-1925.
[5]	何康, 黄春, 姜浩, 谷同祥, 齐进, 刘杰, . 基于MPI的高精度归约函数设计与实现[J]. 计算机工程与科学, 2021, 43(04): 594-602.
[6]	姜尚志, 唐生林, 高希然, 花嵘, 陈莉, 刘颖. “神威·太湖之光”上Tend_lin应用的并行优化研究[J]. 计算机工程与科学, 2020, 42(10高性能专刊): 1842-1851.
[7]	皇甫永硕,刘杰,龚春叶. 基于二维结构化网格的可压缩流体并行算法研究[J]. 计算机工程与科学, 2017, 39(09): 1602-1609.
[8]	宋梦召，冯仰德. 核辐照损伤金属材料的大规模KMC模拟[J]. 计算机工程与科学, 2017, 39(07): 1211-1218.
[9]	邹佩钢1,2，陈军1. 基于CombBLAS的同辈压力图聚类并行算法的设计与实现[J]. 计算机工程与科学, 2017, 39(03): 424-429.
[10]	严忻恺，郝子宇，吴东，谢向辉. MPI非阻塞广播算法及性能研究[J]. J4, 2013, 35(9): 20-26.
[11]	徐磊，徐莹，蒋荣琳，张丹丹. GPU集群上的三维UPML-FDTD算法的实现及优化[J]. J4, 2013, 35(11): 160-167.
[12]	姚光超，郑尧，肖利民，阮利. 基于MPI+GPU的哼唱检索系统加速[J]. J4, 2013, 35(11): 168-174.
[13]	辛乃军,陈旭灿,孙海燕,阳柳,罗杰,淡孝强,王霁. 基于GCC的高性能DSP Matrix向量指令集扩展[J]. J4, 2012, 34(1): 58-63.
[14]	杨灿群，杨学军，易会战. 扩展双精度浮点并行计算：MPI方法[J]. J4, 2010, 32(12): 98-101.
[15]	李肯立[1] 杨进[1] 彭成斌[2] 秦云川[1]. 基于MPI＋OpenMP混合模型的并行地震数据处理支撑库的研究[J]. J4, 2007, 29(12): 136-139.

一种基于MPI的稀疏化局部尺度并行谱聚类算法的研究与实现

A sparse local scaling parallel
spectral clustering algorithm based on MPI

PDF

可视化

摘要/Abstract

引用本文

使用本文

相关文章 15

编辑推荐

Metrics

本文评价

一种基于MPI的稀疏化局部尺度并行谱聚类算法的研究与实现

A sparse local scaling parallel spectral clustering algorithm based on MPI

PDF

可视化

摘要/Abstract

引用本文

使用本文

相关文章 15

编辑推荐

Metrics

本文评价

A sparse local scaling parallel
spectral clustering algorithm based on MPI