基于样本-特征加权的可能性模糊核聚类算法

J4 ›› 2014, Vol. 36 ›› Issue (01): 169-175.

基于样本-特征加权的可能性模糊核聚类算法

黄卫春，刘建林,熊李艳

(华东交通大学信息工程学院，江西南昌 330013)

收稿日期:2012-06-13 修回日期:2012-10-22 出版日期:2014-01-25 发布日期:2014-01-25
基金资助:
江西省自然科学基金资助项目（20114BAB201028）；华东交通大学校立科研基金资助项目（11QT04）

A sample-feature weighted possibilistic fuzzy kernel clustering algorithm

HUANG Weichun,LIU Jianlin,XIONG Liyan

(School of Information Engineering,East China Jiaotong University,Nanchang 330013,China)

Received:2012-06-13 Revised:2012-10-22 Online:2014-01-25 Published:2014-01-25

摘要/Abstract

摘要：

经典的模糊C-均值聚类算法存在对噪声数据较为敏感、未考虑样本属性特征间的不平衡性及对高维数据聚类不理想等问题，而可能性聚类算法虽然解决了噪声敏感和一致性聚类问题，但算法假定每个样本对聚类的贡献程度一样。针对以上问题，提出了一种基于样本-特征加权的可能性模糊核聚类算法，将可能性聚类应用到模糊聚类中以提高其对噪声或例外点的抗干扰能力；同时，根据不同类的具体特性动态计算样本各个属性特征对不同类别的重要性权值及各个样本对聚类的重要性权值，并优化选取核参数，不断修正核函数把原始空间中非线性可分的数据集映射到高维空间中的可分数据集。实验结果表明,基于样本-特征加权模糊聚类算法能够减少噪声数据和例外点的影响，比传统的聚类算法具有更好的聚类准确率。

关键词: 样本加权, 特征加权, 模糊C均值, 可能性模糊聚类, 核函数

Abstract:

Classic fuzzy C-means clustering is a noise-data-sensitive algorithm, which does not take the imbalances among characteristics of samples into consideration and is not suitable for clustering high dimensional data. The possibilistic clustering solves the noisesensitive and consistency of clustering problems but it is under the assumption that each sample has the same contribution to the clustering. Therefore, a samplefeature weighted possibilistic fuzzy kernel clustering algorithm is proposed. The possibilistic clustering is applied to fuzzy clustering in order to improve the antiinterference ability of noise or exceptional points, meanwhile, according to the specific characteristics of different types, the importance of each sample characteristic upon different types is measured dynamically, as well as the importance of each sample upon different cluster, and the optimal nuclear parameters is selected. To map the nonlinearseparable data cluster in the original space to the homogeneous data cluster in the highdimensional space, the kernel functions are modified constantly. The experimental results show that the samplefeature weighted possibilistic fuzzy kernel clustering algorithm can reduce the impact of noisy data and exceptional points and it has better clustering rate than classic clustering algorithm.

Key words: sample weighted;feature weighted;fuzzy C-means;possibilistic fuzzy clustering;kernel

黄卫春，刘建林,熊李艳. 基于样本-特征加权的可能性模糊核聚类算法[J]. J4, 2014, 36(01): 169-175.

HUANG Weichun,LIU Jianlin,XIONG Liyan. A sample-feature weighted possibilistic fuzzy kernel clustering algorithm [J]. J4, 2014, 36(01): 169-175.

[1]	李晶晶, 许少华. 基于组合核函数的径向基过程神经网络及其在示功图诊断中的应用[J]. 计算机工程与科学, 2021, 43(04): 746-752.
[2]	吕治政, 李扬定, 雷聪. 基于核稀疏表示的属性选择算法[J]. 计算机工程与科学, 2020, 42(01): 166-177.
[3]	楚恒1,2,3,4，蔡衡1,2,3，单德明1,2,3. 高分辨率遥感影像的多特征多核ELM分类方法[J]. 计算机工程与科学, 2019, 41(10): 1816-1822.
[4]	任佳1,2，张胜男1，董超2，赵敏钧1. 基于改进模糊C均值的海面目标图像分割算法[J]. 计算机工程与科学, 2019, 41(05): 858-864.
[5]	汪赫瑜,唐敏影,任建华. 基于二次网格优化的粒子群模糊聚类算法[J]. 计算机工程与科学, 2019, 41(02): 354-362.
[6]	袁晖1，廖开阳1,3，郑元林1,2，曹从军1,3，汤梓伟1，邓轩1. 基于CNN特征加权和区域整合的图像检索[J]. 计算机工程与科学, 2019, 41(01): 113-121.
[7]	蔡国永,毕梦莹,刘建兴. 基于标记信息级联传播树特征的谣言检测新方法[J]. 计算机工程与科学, 2018, 40(08): 1488-1495.
[8]	文传军1，汪庆淼2. 广义多变量模糊C均值聚类算法[J]. 计算机工程与科学, 2017, 39(11): 2153-2160.
[9]	周书仁1,2，曹思思1,2，蔡碧野1,2. 基于改进极限学习机算法的行为识别[J]. 计算机工程与科学, 2017, 39(09): 1749-1757.
[10]	吴辰文，李长生，王伟，梁靖涵，闫光辉. 一种改进的SVM算法在乳腺癌诊断方面的应用[J]. 计算机工程与科学, 2017, 39(03): 562-566.
[11]	林克正，魏颖，钟岩，李慧. 基于测地距离的KPCA人脸识别[J]. 计算机工程与科学, 2016, 38(09): 1858-1862.
[12]	魏利峰1,2，纪建伟1. 一种高精度高光谱图像分类方案设计[J]. 计算机工程与科学, 2016, 38(07): 1462-1470.
[13]	梁礼明，钟震，陈召阳. 支持向量机核函数选择研究与仿真[J]. J4, 2015, 37(06): 1135-1141.
[14]	文传军1，汪庆淼2. 广义可能性C均值聚类算法[J]. J4, 2015, 37(05): 1015-1018.
[15]	陈宝文，谭旭. 基于多蚁群算法的支持向量回归机参数选择方法[J]. J4, 2012, 34(9): 113-117.