• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

An improved SDRSMOTE algorithm
based on Euclidean distance

LI Ke-wen1,LIN Ya-lin1,YANG Yao-zhong2   

  1. (1.College of Computer & Communication Engineering,China University of Petroleum,Qingdao 266580;
    2.Control Center of Informatization Sinopec Shengli Oil Field,Dongying 257022,China)

     
  • Received:2018-12-21 Revised:2019-04-23 Online:2019-11-25 Published:2019-11-25

Abstract:

The SMOTE algorithm can extend the minority samples and improve the classification ability of a few classes in the unbalanced data set. However, it blindly chooses boundary samples and the value of random numbers when extending the minority samples. This paper improves the traditional SMOTE oversampling algorithm, called SDRSMOTE. It takes into account all the unbalanced data sets. The distribution of all the samples, through the introduction of support degree sd and the influencing factor posFac to guide the synthesis of the minority samples. On the WEKA platform, the SMOTE and SDRSMOTE algorithms are used to preprocess the selected six unbalanced data sets and use the decision tree, AdaBoost, Bagging and Naive Bayes classifiers to predict the preprocessed datasets. The data set is classified, and F-value, G-mean and AUC are selected as evaluation indexes. The experiment shows that the unbalanced datasets preprocessed by the improved SDRSMOTE algorithm have better classification effect, which proves the effectiveness of the algorithm.

Key words: unbalanced data set, classification, boundary sample, support degree, influencing factor, Euclidean distance, SMOTE