• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (10): 1730-1735.

• 高性能计算 • 上一篇    下一篇

基于压缩子空间对齐的多核聚类算法

欧琦媛,祝恩   

  1. (国防科技大学计算机学院,湖南 长沙 410073)

  • 收稿日期:2021-02-05 修回日期:2021-04-12 接受日期:2021-10-25 出版日期:2021-10-25 发布日期:2021-10-21

Multiple-kernel clustering based on compressed subspace alignment

OU Qi-yuan,ZHU En   

  1. (College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)
  • Received:2021-02-05 Revised:2021-04-12 Accepted:2021-10-25 Online:2021-10-25 Published:2021-10-21

摘要: 近年来,多核聚类(MKC)在融合多源信息以提高聚类性能方面取得了显著进展。但是,以n表示样本数,O(n2)内存消耗和On3计算消耗限制了这些方法的实用性。重新设计了基于子空间分割的MKC公式,从而将其内存和计算复杂度分别降低到O(n)和O(n2)。在该算法(基于压缩子空间对齐的多核聚类算法CSA-MKC)中,通过对部分数据采样来重建整个数据集。具体而言,在该算法中,在信息融合过程中同时学习了共识采样矩阵,从而使生成的锚点集更适合于跨不同视图的数据重建。因此,改进了重构矩阵的判别性,并增强了聚类性能。此外,该算法易于并行化,通过GPU加速,在6个数据集上进行了测试,在时间上,该算法是数据规模的平方复杂度,在性能上,优于目前的先进算法。

关键词: 多核聚类, 子空间聚类, 子空间对齐, 多视图聚类, 大规模机器学习

Abstract: In recent years, multiple-kernel clustering (MKC) has achieved remarkable progress in fusing information from multi-source to boost the performance of clustering. However, denoting n as the sample number, the O(n2)  memory consumption and the  O(n3) computational consumption limit the practicality of these methods. In this paper, we redesign the formulation of subspace segmentation-based MKC, thereby reducing its memory and computational complexity to O(n)  and  O(n2), respectively. In the proposed algorithm, maned Compressed Subspace Alignment based Multiple Kernel Clustering(CSA-MKC), we sample only a part of the data to reconstruct the whole dataset. Specifically, in our design, a consensus sampling matrix is learned simultaneously with the information fusion process, so as to make the generated anchor point set more suitable for data reconstruction across different views. Consequently, the discriminative capability of the reconstruction matrix is improved, and the performance of clustering is enhanced. Moreover, since our algorithm is straightforward for parallelization, through the acceleration of GPU, our algorithm can achieve superior performance against the compared state-of-the-art methods on six datasets with square time cost. 




Key words: multiple-kernel clustering, subspace clustering, subspace alignment, multi-view cluster- ing, large-scale machine learning