• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2014, Vol. 36 ›› Issue (01): 169-175.

• 论文 • 上一篇    下一篇

基于样本-特征加权的可能性模糊核聚类算法

黄卫春,刘建林,熊李艳   

  1. (华东交通大学信息工程学院,江西 南昌 330013)
  • 收稿日期:2012-06-13 修回日期:2012-10-22 出版日期:2014-01-25 发布日期:2014-01-25
  • 基金资助:

    江西省自然科学基金资助项目(20114BAB201028);华东交通大学校立科研基金资助项目(11QT04)

A sample-feature weighted possibilistic fuzzy kernel clustering algorithm    

HUANG Weichun,LIU Jianlin,XIONG Liyan   

  1. (School of Information Engineering,East China Jiaotong University,Nanchang 330013,China)
  • Received:2012-06-13 Revised:2012-10-22 Online:2014-01-25 Published:2014-01-25

摘要:

经典的模糊C-均值聚类算法存在对噪声数据较为敏感、未考虑样本属性特征间的不平衡性及对高维数据聚类不理想等问题,而可能性聚类算法虽然解决了噪声敏感和一致性聚类问题,但算法假定每个样本对聚类的贡献程度一样。针对以上问题,提出了一种基于样本-特征加权的可能性模糊核聚类算法,将可能性聚类应用到模糊聚类中以提高其对噪声或例外点的抗干扰能力;同时,根据不同类的具体特性动态计算样本各个属性特征对不同类别的重要性权值及各个样本对聚类的重要性权值,并优化选取核参数,不断修正核函数把原始空间中非线性可分的数据集映射到高维空间中的可分数据集。实验结果表明,基于样本-特征加权模糊聚类算法能够减少噪声数据和例外点的影响,比传统的聚类算法具有更好的聚类准确率。

关键词: 样本加权, 特征加权, 模糊C均值, 可能性模糊聚类, 核函数

Abstract:

Classic fuzzy C-means clustering is a noise-data-sensitive algorithm, which does not take the imbalances among characteristics of samples into consideration and is not suitable for clustering high dimensional data. The possibilistic clustering solves the noisesensitive and consistency of clustering problems but it is under the assumption that each sample has the same contribution to the clustering. Therefore, a samplefeature weighted possibilistic fuzzy kernel clustering algorithm is proposed. The possibilistic clustering is applied to fuzzy clustering in order to improve the antiinterference ability of noise or exceptional points, meanwhile, according to the specific characteristics of different types, the importance of each sample characteristic upon different types is measured dynamically, as well as the importance of each sample upon different cluster, and the optimal nuclear parameters is selected. To map the nonlinearseparable data cluster in the original space to the homogeneous data cluster in the highdimensional space, the kernel functions are modified constantly. The experimental results show that the samplefeature weighted possibilistic fuzzy kernel clustering algorithm can reduce the impact of noisy data and exceptional points and it has better clustering rate than classic clustering algorithm.

Key words: sample weighted;feature weighted;fuzzy C-means;possibilistic fuzzy clustering;kernel