• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2021, Vol. 43 ›› Issue (05): 917-925.

Previous Articles     Next Articles

A multi-granularity ensemble classification algorithm for imbalanced data

CHEN Li-fang,DAI Qi,ZHAO Jia-liang#br#

#br#
  

  1. (College of Science,North China University of Science and Technology,Tangshan 063210,China)


  • Received:2020-03-07 Revised:2020-05-13 Accepted:2021-05-25 Online:2021-05-25 Published:2021-05-19

Abstract: To address the problems of low accuracy, poor stability and weak generalization ability used in the traditional model when solving the problem of imbalanced data classification, a sequential three-way decision multi-granulation ensemble classification algorithm is proposed. A binary relationship is adopted to realize the dynamic division of the granular layer. The threshold value is calculated according to the cost matrix and a multi-layer granular structure is constructed. The data of each granular layer is divided into a positive domain, a boundary domain, and a negative domain, and the division on each granular layer is recombined according to positive and negative domains, positive and boundary domains, and negative and boundary domains to form a new data subset. A base classifier is built on each data subset to achieve the ensemble classification of imbalanced data. Simulation results show that the algorithm can effectively reduce the imbalance ratio of data subsets and improve the difference of the base classifier in ensemble learning. Under the two evaluation indexes of G-mean and F-measure1, the classification performance is better or partially better than other ensemble classification algorithms. The new algorithm effectively improves the classification accuracy and stability of the classification model, and provides new research thoughts for ensemble learning of imbalanced data sets.




Key words: sequential three-way decision;multi-granularity;cost sensitive;imbalanced data, ensemble learning