• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

A mislabeled data detection method
 based on sparse reconstruction weights
 
#br#  

WU Jing-sheng,WANG Jing,DU Ji-xiang   

  1. (School of Computer Science and Technology,Huaqiao University,Xiamen 361021,China)
  • Received:2016-01-21 Revised:2016-05-17 Online:2017-11-25 Published:2017-11-25

Abstract:

The accuracy of data classification depends on the quality and quantity of labeled data. When training data is mislabeled, data classification accuracy is greatly affected. In view of this situation, we propose a detection method based on the sparse reconstruction weights for erroneous labeling data. Firstly, we apply the k-nearest neighbor method to search their neighbor points for the training data that contains wrong labels. Each local sparse reconstruction weight can be calculated by solving the LS model with L1-norm. Secondly, we use parse reconstruction weights to calculate the label confidence level of every labeled data. Finally, by finding the position of the maximum curvature on the confidence curve, this method can adaptively detect the mislabeled data. Experiments on real data demonstrate that the proposed algorithm is effective.

Key words: sparse reconstruction weight, mislabeled, confidence level, detection