• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (09): 1685-1692.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于双重自表达与最大熵原理的深度子空间聚类算法

李猛,刘姿邑,宋宇航   

  1. (西安工程大学计算机科学学院,陕西 西安 710048)
  • 收稿日期:2023-07-11 修回日期:2023-10-25 接受日期:2024-09-25 出版日期:2024-09-25 发布日期:2024-09-23
  • 基金资助:
    陕西省教育厅重点项目(22JS019)

A deep subspace clustering algorithm based on dual self-expression and the maximum entropy principle

LI Meng,LIU Zi-yi,SONG Yu-hang   

  1. (School of Computer Science,Xi’an Polytechnic University,Xi’an 710048,China)
  • Received:2023-07-11 Revised:2023-10-25 Accepted:2024-09-25 Online:2024-09-25 Published:2024-09-23

摘要: 深度子空间聚类算法使用深度神经网络将原始输入数据映射至潜在空间,并利用数据的自表达性作为数据相似程度的度量,从而实现对高维数据的有效聚类。然而,这类算法仅关注潜在空间中的自表达关系,导致其性能严重依赖于深度神经网络所提取特征的质量。此外,正则化过程忽略各空间内的连通性,影响谱聚类算法的性能。针对这些问题,提出了基于双重自表达与最大熵原理的深度子空间聚类算法。该算法同时学习潜在空间与输入空间的自表达关系,以引导深度神经网络获得适合于子空间聚类的数据表示。通过最大化相似度矩阵的熵,确保同一子空间的元素分布均匀且密集,从而提升数据聚类性能。在5个数据集上进行大量实验,验证了所提算法的有效性。

关键词: 子空间聚类, 自表达, 深度神经网络, 最大熵原理

Abstract: The deep subspace clustering algorithm utilizes deep neural networks to map the original input data to a latent space and employs the self-expression of the data as a measure of data similarity, effectively achieving clustering of high-dimensional data. However, such algorithms only focus on the self-expressive relationship in the latent space, resulting in their performance heavily relying on the quality of features extracted by the deep neural networks. Additionally, the regularization process ignores the connectivity within each subspace, affecting the performance of spectral clustering. To address these issues, a deep subspace clustering algorithm based on dual self-expression and the maximum entropy principle is proposed. This algorithm simultaneously learns the self-expressive relationships in both the latent space and the input space, guiding the deep neural network to obtain data representations suitable for subspace clustering. By maximizing the entropy of the similarity matrix, it ensures that elements within the same subspace are uniformly and densely distributed, thereby improving the performance of data clustering. Extensive experiments on five datasets verify the effectiveness of the proposed algorithm. 

Key words: subspace clustering, self-expression, deep neural network, the maximum entropy principle