• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    下一篇

基于概率阈值Bagging算法的不平衡数据分类方法

张忠林,吴挡平   

  1. (兰州交通大学电子与信息工程学院,甘肃 兰州 730070)
  • 收稿日期:2018-06-11 修回日期:2018-09-06 出版日期:2019-06-25 发布日期:2019-06-25
  • 基金资助:

    国家自然科学基金(61662043)

An imbalanced data classification method
based on probability threshold Bagging

ZHANG Zhonglin,WU Dangping   

  1. (School of Electronic and Information Engineering,Lanzhou Jiaotong University,Lanzhou 730070,China)
  • Received:2018-06-11 Revised:2018-09-06 Online:2019-06-25 Published:2019-06-25

摘要:

类别不平衡问题广泛存在于现实生活中,多数传统分类器假定类分布平衡或误分类代价相等,因此类别不平衡数据严重影响了传统分类器的分类性能。针对不平衡数据集的分类问题,提出了一种处理不平衡数据的概率阈值Bagging分类方法—PTBagging。将阈值移动技术与Bagging集成算法结合起来,在训练阶段使用原始分布的训练集进行训练,在预测阶段引入决策阈值移动方法,利用校准的后验概率估计得到对不平衡数据分类的最大化性能测量。实验结果表明,PTBagging算法具有更好的处理不平衡数据的分类优势。

关键词: 不平衡数据, 阈值移动, Bagging集成学习, 后验概率

Abstract:

The category imbalance problem exists widely in real life. Most of the traditional classifiers assume balanced class distribution or equal misclassification cost. However, when dealing with unbalanced data, their classification performance is seriously affected. Aiming at the classification problem of imbalanced data sets, we propose a probability threshold Bagging classification algorithm, called PT-Bagging to deal with unbalanced data. The algorithm combines the threshold-moving technique with the bagging ensemble algorithm, uses the original distributed training set for training in the training phase, introduces a decision threshold-moving method in the prediction phase, and employs the calibrated posterior probability estimation to obtain the maximized average performance measurement of the imbalanced data classification. Experimental results show that the PT-Bagging algorithm can better classify imbalanced data.

Key words: imbalanced data, thresholdmoving, Bagging integrated learning, posterior probability