• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 高性能计算 • 上一篇    下一篇

基于聚类的环形kNN算法

匡振曦,武继刚,李嘉兴   

  1. (广东工业大学计算机学院,广东 广州 510006)
  • 收稿日期:2018-11-15 修回日期:2019-02-28 出版日期:2019-05-25 发布日期:2019-05-25
  • 基金资助:

    国家自然科学基金(61672171,61702115,61702114);广东省自然科学基金重点项目(2018B030311007)

A clustering-based annular k-nearest neighbor algorithm

KUANG Zhenxi,WU Jigang,LI Jiaxing   

  1. (School of Computers,Guangdong University of Technology,Guangzhou 510006,China)
  • Received:2018-11-15 Revised:2019-02-28 Online:2019-05-25 Published:2019-05-25

摘要:

传统k最近邻算法kNN在数据分类中具有广泛的应用,但该算法具有较多的冗余计算,致使处理高维数据时花费较多的计算时间。同时,基于地标点谱聚类的分类算法(LCkNN和RCkNN)中距离当前测试点的最近邻点存在部分缺失,导致其准确率降低。针对上述问题,提出一种基于聚类的环形k最近邻算法。提出的算法在聚类算法的基础上,首先将训练集中相似度较高的数据点聚成一个簇,然后以当前测试点为中心设置一个环形过滤器,最后通过kNN算法对过滤器中的点进行分类,其中聚类算法可以根据实际情况自由选择。算法性能已在UCI数据库中6组公开数据集上进行了实验测试,实验结果表明:AkNNE与AkNNH算法比kNN算法在计算量上平均减少51%,而在准确率上比LCkNN和RCkNN算法平均提高3%。此外,当数据在10 000维的情况下该算法仍然有效。
 

关键词: 环形过滤器, 聚类, 分类, 相邻簇心组, 三角不等式

Abstract:

The traditional knearest neighbor (kNN) algorithm has been widely used in data classification, but its redundant distance computation consumes much time for highdimensional data processing. Meanwhile, for the classification algorithms based on landmarkbased spectral clustering (LCkNN and RCkNN), the nearest neighbor points of the current testing point are partially missing, which results in low accuracy. Aiming at the above problems, we propose a clusteringbased annular knearest neighbor algorithm (AkNN). Based on the traditional clustering algorithm, the proposed algorithm collects data points with high similarity in the training set into a cluster. Then an annular filter centered on the testing point is set. Finally, the points in the filter are classified by the kNN algorithm. We can choose a clustering algorithm freely according to practical needs. The performance of the proposed algorithm is tested on 6 open data sets in the UCI database. Experimental results show that the AkNNE and AkNNH algorithms reduce calculation by 51% on average compared with the kNN algorithm, and enhance the accuracy by 3% on average compared with the LC-kNN and RC-kNN algorithms. In addition, the proposed algorithm is still valid in 10000 dimensional datasets.

 

 

Key words: annular filter, clustering, classification, neighboring centroid, triangle inequality