• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2015, Vol. 37 ›› Issue (09): 1783-1793.

• 论文 • 上一篇    下一篇

基于相关系数的决策树优化算法

董跃华,刘力   

  1. (江西理工大学信息工程学院,江西 赣州 341000)
  • 收稿日期:2014-08-25 修回日期:2014-12-29 出版日期:2015-09-25 发布日期:2015-09-25

An optimized algorithm of decision tree
based on correlation coefficients 

DONG Yuehua,LIU Li   

  1. (School of Information Engineering,Jiangxi University of Science and Technology,Ganzhou 341000,China)
  • Received:2014-08-25 Revised:2014-12-29 Online:2015-09-25 Published:2015-09-25

摘要:

通过分析ID3算法的基本原理及其多值偏向问题,提出了一种基于相关系数的决策树优化算法。首先通过引进相关系数对ID3算法进行改进,从而克服其多值偏向问题,然后运用数学中泰勒公式和麦克劳林公式的性质,对信息增益公式进行近似简化。通过具体数据的实例验证,说明优化后的ID3算法能够解决多值偏向问题。标准数据集UCI上的实验结果表明,在构建决策树的过程中,既提高了平均分类准确率,又降低了构建决策树的复杂度,从而还缩短了决策树的生成时间,当数据集中的样本数较大时,优化后的ID3算法的效率得到了明显的提高。

关键词: ID3算法, 相关系数, 决策树, 泰勒公式, 信息增益

Abstract:

Aiming at the problem of multivalue bias in ID3 algorithm, we propose an optimized algorithm of decision tree based on correlation coefficients. Firstly, the correlation coefficients between the attributes are introduced to improve the ID3 algorithm, and in turn the multivalue bias problem is overcome. Then the properties of Taylor formula and Maclaurin formula are adopted to simplify the information gain formula. The concrete data of examples prove that the optimized ID3 algorithm can overcome multivalue bias problem. Experiments on the standard UCI data sets show that the optimized algorithm of decision tree not only improves the accuracy of average classification, but also reduces the complexity in building decision trees and thus reduces the generation time of decision trees. Besides, the efficiency of the optimized ID3 algorithm increases significantly for large scale samples.

Key words: ID3 algorithm;correlation coefficient;decision tree;Taylor formula;information gain