• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2016, Vol. 38 ›› Issue (05): 1023-1030.

• 论文 • 上一篇    下一篇

一种单位代价收益决策树剪枝算法

周美琴,陈诗旭,袁鼎荣,朱新华   

  1. (广西师范大学广西“多源信息挖掘与安全”重点实验室,广西 桂林 541004)
  • 收稿日期:2015-06-10 修回日期:2015-08-01 出版日期:2016-05-25 发布日期:2016-05-25
  • 基金资助:

    国家自然科学基金(61462010,61363036)

A pruning algorithm of decision
tree based on unit cost gains  

ZHOU Meiqin,CHEN Shixu,YUAN Dingrong,ZHU Xinhua   

  1. (Guangxi Key Lab of MultiSource Information Mining & Security,Guangxi Normal University,Guilin 541004,China)
  • Received:2015-06-10 Revised:2015-08-01 Online:2016-05-25 Published:2016-05-25

摘要:

目前关于决策树剪枝优化方面的研究主要集中于预剪枝和后剪枝算法。然而,这些剪枝算法通常作用于传统的决策树分类算法,在代价敏感学习与剪枝优化算法相结合方面还没有较好的研究成果。基于经济学中的效益成本分析理论,提出代价收益矩阵及单位代价收益等相关概念,采用单位代价收益最大化原则对决策树叶节点的类标号进行分配,并通过与预剪枝策略相结合,设计一种新型的决策树剪枝算法。通过对生成的决策树进行单位代价收益剪枝,使其具有代价敏感性,能够很好地解决实际问题。实验结果表明,该算法能生成较小规模的决策树,且与REP、EBP算法相比具有较好的分类效果。

关键词: 代价, 收益, 剪枝算法, 决策树

Abstract:

At present, the research on pruning optimization of the decision tree focuses on prepruning and postpruning algorithms. However, these pruning algorithms are usually effective for the traditional decision tree classification algorithms. The research on the pruning optimization algorithm integrated with cost sensitive learning has no better results. We design a new pruning algorithm based on the benefit cost analysis theory of economics. We define the concepts of cost gains matrix and unit cost gains. Furthermore, the class labels of decision tree leaves are distributed on the principle of maximizing the gains of unit cost. Pruning the decision tree based on the unit cost gains can solve practical problems well. Experimental results show that the proposed algorithm outperforms current pruning algorithms, which can generate a smaller decision tree and which has a better classification effect in comparison with the REP and EBP algorithms.

Key words: cost;gains;pruning algorithm;decision tree