• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2021, Vol. 43 ›› Issue (10): 1856-1863.

Previous Articles     Next Articles

An imbalanced data classification algorithm based on DPC clustering resampling combined with ELM

DONG Hong-cheng1,2,WEN Zhi-yun1,2,3,WAN Yu-hui1,2,YAN Fei-yang1,2#br#

#br#
  

  1. (1.School of Communication and Information Engineering,
    Chongqing University of Posts and Telecommunications,Chongqing 400065;

    2.Research Center of New Telecommunication Technology,
    Chongqing University of Posts and Telecommunications,Chongqing 400065;

    3.Chongqing Information Technology Designing Co.,Ltd.,Chongqing 401121,China)

  • Received:2020-06-24 Revised:2020-09-03 Accepted:2021-10-25 Online:2021-10-25 Published:2021-10-22
  • About author:DONG Hong-cheng ,born in 1969,PhD,senior engineer,his research interests include big data, and data analysis.

Abstract: The combination of sampling technology and ELM classification algorithm can improve the classification accuracy of a small number of samples, but most existing sampling methods that combine ELM do not take into account the imbalance of the sample and the distribution within the sample. The sampling technique is too single, resulting in low efficiency of the classification model and low recognition rate of a small number of samples. In order to solve this problem, this paper proposes an imba- lanced data classification algorithm based on DPC clustering resampling combined with ELM. First, a mixed sampling model is constructed to balance the data set in two cases according to the degree of imbalance of the data set. Secondly, the DPC clustering algorithm is used to analyze and deal with the majority and minority classes on this model respectively. It can solves the problem of intra-class imbalance and noise in the data, so that the two types of samples are relatively balanced. Finally, the obtained ba- lanced data sets are classified using the ELM classification algorithm. Compared with the same type of classification algorithm, the two classification performance indexes F-Measure and G-mean of the proposed algorithm are significantly improved on the experimental data set.


Key words: extreme learning machine, imbalanced data classification, DPC clustering, resampling