• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    下一篇

一种基于密度峰值的针对模糊混合数据的聚类算法

陈奕延1,2 ,李晔3,李存金1   

  1. (1.北京理工大学管理与经济学院,北京 100081;2.中国管理科学研究院学术委员会,北京 100036;
    3.中国社会科学院大学(研究生院),北京 102488)
     
  • 收稿日期:2019-08-05 修回日期:2019-10-21 出版日期:2020-02-25 发布日期:2020-02-25

A fuzzy mixed data clustering algorithm
based on density peaks

CHEN Yi-yan1,2 ,LI Ye3 ,LI Cun-jin1   

  1. (1.School of Management and Economics,Beijing Institute of Technology,Beijing 100081;
    2.Academic Committee,China Academy of Management Science,Beijing 100036;
    3.Graduate School,University of Chinese Academy of Social Sciences,Beijing 102488,China)
     
     
  • Received:2019-08-05 Revised:2019-10-21 Online:2020-02-25 Published:2020-02-25

摘要:

将CFSFDP算法拓展到连续型模糊集和离散型模糊集上,提出了一种针对模糊混合数据的拓展型CFSFDP算法,将其命名为FMD-CFSFDP算法。FMD-CFSFDP算法将样本涵盖的经典信息拓展到了模糊集上,利用寻找密度峰值的方法对模糊样本进行聚类,这是一种建立在模糊集上针对模糊混合数据的基于密度的聚类算法。
首先简单介绍了CFSFDP算法及其改进,给出了“模糊混合数据”的数学概念;然后结合传统模糊欧氏距离的概念,分别提出了误差更小的针对连续型模糊集与离散型模糊集的改进型欧氏距离,在此基础上,依托权值构建了针对混合型模糊数据的整体距离。参考CFSFDP算法的聚类步骤给出了FMD-CFSFDP算法的聚类步骤。随后,在不同样本量、不同指标数量、不同簇数、不同取数规则的条件下,对算法进行了随机模拟实验并对聚类结果进行了分析。最后分别总结了FMD-CFSFDP算法的优缺点,并在此基础上提出了改进方案,为今后深入研究提供了参考。
 
 

关键词: 模糊混合数据, 基于密度峰值的聚类, FMD-CFSFDP算法, 改进型欧氏距离, 整体距离

Abstract:

By extending CFSFDP algorithm to continuous fuzzy sets and discrete fuzzy sets, an extended CFSFDP algorithm for fuzzy mixed data is proposed, which is named FMD-CFSFDP algorithm. The FMD-CFSFDP algorithm extends the classical information in the sample to fuzzy sets, and achieves the clustering of fuzzy sets by seeking the density peaks. The proposed FMD-CFSFDP algorithm is a kind of density-based clustering algorithm established on fuzzy set for fuzzy mixed data. Firstly, the CFSFDP algorithm and some of its improvement algorithms are briefly introduced, and the mathematical definition of fuzzy mixed data is given. Secondly, by combining the concept of traditional fuzzy Euclidean distance, the improved Euclidean distance for both continuous and discrete fuzzy sets with smaller error is proposed. On the basis, the weight is introduced to establish the overall distance for fuzzy mixed data. By referring to the clustering steps of the CFSFDP algorithm, the clustering steps of FMD-CFSFDP algorithm are given. Furthermore, under the conditions of different sample size, different index number, different cluster number and different fetching rule, random simulation experiments are carried out on the algorithm and the clustering results are analyzed. Finally, the advantages and disadvantages of the FMD-CFSFDP algorithm are summarized respectively. On this basis, some improved schemes are proposed, which provides a reference for future in-depth research.

 

Key words: fuzzy mixed data, density peaks based clustering, FMD-CFSFDP algorithm, improved Euclidean distance, overall distance