• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2015, Vol. 37 ›› Issue (10): 1965-1970.

• 论文 • 上一篇    下一篇

一种高效的K值自适应的SA-KNN算法

孙可1,3,龚永红1,2,邓振云1,3   

  1.  (1.广西师范大学广西多源信息挖掘与安全重点实验室,广西 桂林 541004;2.桂林航天工业学院,广西 桂林 541004)
    3.广西师范大学计算机科学与信息工程学院,广西 桂林 541004)
  • 收稿日期:2014-09-22 修回日期:2014-11-26 出版日期:2015-10-25 发布日期:2015-10-25
  • 基金资助:

    国家自然科学基金资助项目(61170131和61263035);国家863计划资助项目(2012AA011005);国家973计划资助项目(2013CB329404);广西自然科学基金资助项目(2012GXNSFGA060004);广西八桂创新团队和广西百人计划资助;广西研究生教育创新计划项目(YCSZ2015095,YCSZ2015096)

An efficient SA-KNN algorithm with adaptive K value  

SUN Ke1,3,GONG Yonghong1,2,DENG Zhenyun1,3   

  1. (1.Guangxi Key Laboratory of MultiSource Information Mining & Security,Guangxi Normal University,Guilin 541004;
    2.Guilin University of Aerospace Technology,Guilin 541004;
    3. College of Computer Science and Information Technology,Guangxi Normal University,Guilin 541004,China)
  • Received:2014-09-22 Revised:2014-11-26 Online:2015-10-25 Published:2015-10-25

摘要:

传统的K近邻(KNN)分类算法在实际应用过程中存在一些缺陷:没有考虑去除噪声样本,也没有考虑到在样本数据空间变换过程中保持样本数据本身的流形学结构,并且没有使用样本间属性的相关性。为此,提出引入稀疏学习理论,利用训练样本重构测试样本的方法,重构过程使用了样本间的相关性,也用到局部保持投影LPP保持数据结构不变,同时引入l2,1范数用于去除噪声样本的方法来寻找投影变换矩阵W,进而利用W确定KNN算法中K值的SA-KNN算法。在UCI数据集上的仿真实验结果表明,该方法比传统的KNN分类算法和EntropyKNN算法有更高的分类准确度。

关键词: K近邻分类, 相关性, 去除噪声样本, 局部保持投影, 稀疏学习

Abstract:

Traditional K Nearest Neighbors (KNN) classification method has drawbacks such as no elimination of noise samples,no manifold structure preservation of the samples, and no consideration of the correlation between samples.To solve these problems,we propose an efficient SAKNN algorithm with adaptive K value.Sparse learning theory is introduced and we reconstruct each test sample with the training samples for KNN classification. We introduce an  l2,1 norm to remove the noisy samples,employ the Locality Preserving Projections (LPP) to keep the data structures,and makes the best use of the correlation between the samples in the reconstruction process.With these technologies we can get the transformation matrix W and in turn determine the value of K.Simulation results on the UCI data sets demonstrate a better classification accuracy than the traditional KNN and the EntropyKNN method.

Key words: K nearest neighbor (KNN) classification;correlation;removal of noise samples;locality preserving projection;sparse learning