• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2024, Vol. 46 ›› Issue (02): 292-302.

• Artificial Intelligence and Data Mining • Previous Articles     Next Articles

A model-based non-convex clustering algorithm

ZHONG Zhuo-hui1,2,CHEN Li-fei1,2,3   

  1. (1.College of Computer and Cyber Security,Fujian Normal University,Fuzhou 350117;
    2.Digital Fujian Internet-of-Things Laboratory of Environment Monitoring,Fuzhou 350117;
    3.Center for Applied Mathematics of Fujian Province,Fuzhou 350117,China)
  • Received:2022-06-28 Revised:2022-09-22 Accepted:2024-02-25 Online:2024-01-25 Published:2024-02-24

Abstract: Since data may be distributed on an irregular manifold, where the underlying clusters often exhibit non-convex shapes and structures, the clustering problems for such data are collectively referred to as non-convex clustering. However, existing mainstream non-convex clustering methods, including clustering based on original space and clustering based on Space Transformation, ignore the explicit description of non-convex data patterns and fail to understand and describe the underlying mechanisms that produce such structures. Therefore, a descriptive clustering model is proposed to act on non-convex clustering. Firstly, a feature-weighted kernel density model with a hybrid form is defined based on the kernel density method that does not need to assume any probability distribution model in advance and does not restrict the shape of clusters, which cannot be achieved by traditional model-based cluster- ing methods. Secondly, the clustering objective function is derived based on the proposed model, and an optimization algorithm for solving the local density maximum of the density function is proposed based on the expectation maximization algorithm. The clusters are divided into those samples that rise to the same density maximum of the density function. Finally, a model-based non-convex clustering algorithm is defined. The algorithm does not need to manually define the number of clusters, and can assign an explicit probability density function to each cluster, which helps to characterize clusters more robustly and accurately. In addition, the algorithm not only performs adaptive bandwidth selection, but also gives the feature weight of the sample space, enabling the automatic embedded feature selection during the clustering process.

Key words: non-convex clustering, descriptive model, model-based clustering, feature selection, kernel density estimation, local density maximum