• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

An imbalanced data classification method
based on probability threshold Bagging

ZHANG Zhonglin,WU Dangping   

  1. (School of Electronic and Information Engineering,Lanzhou Jiaotong University,Lanzhou 730070,China)
  • Received:2018-06-11 Revised:2018-09-06 Online:2019-06-25 Published:2019-06-25

Abstract:

The category imbalance problem exists widely in real life. Most of the traditional classifiers assume balanced class distribution or equal misclassification cost. However, when dealing with unbalanced data, their classification performance is seriously affected. Aiming at the classification problem of imbalanced data sets, we propose a probability threshold Bagging classification algorithm, called PT-Bagging to deal with unbalanced data. The algorithm combines the threshold-moving technique with the bagging ensemble algorithm, uses the original distributed training set for training in the training phase, introduces a decision threshold-moving method in the prediction phase, and employs the calibrated posterior probability estimation to obtain the maximized average performance measurement of the imbalanced data classification. Experimental results show that the PT-Bagging algorithm can better classify imbalanced data.

Key words: imbalanced data, thresholdmoving, Bagging integrated learning, posterior probability