• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 • 上一篇    下一篇

基于相关系数的ID3优化算法

吴思博,陈志刚,黄瑞   

  1. (中南大学软件学院,湖南 长沙 410075)
  • 收稿日期:2016-07-16 修回日期:2016-09-01 出版日期:2016-11-25 发布日期:2016-11-25
  • 基金资助:

    国家自然科学基金(61379057)

An improved ID3 algorithm based on correlation coefficients

WU Sibo,CHEN Zhigang,HUANG Rui   

  1. (School of Software,Central South University,Changsha 410075,China)
  • Received:2016-07-16 Revised:2016-09-01 Online:2016-11-25 Published:2016-11-25

摘要:

ID3算法是目前最具有影响力的一种决策树构造算法,但仍然有许多的缺点,例如在多值属性偏向方面
的问题、计算时间复杂度高、效率不高等问题。提出了一种基于斯皮尔曼等级相关系数的ID3决策树构造
优化算法,利用相关系数克服了ID3算法在多值属性偏向方面的问题,在一定程度上提高了算法的分类准
确率。利用相关数学知识对计算过程进行了化简,减少了ID3算法在log运算上的运行时间。最后通过实
验验证了优化后的算法是可行的,且在准确率和运行速度方面都有更好的表现。
 

关键词: 决策树, ID3算法, 信息熵, 斯皮尔曼等级, 相关系数

Abstract:

The ID3 algorithm is the most influential algorithm in decision tree construction. However,
it has some disadvantages, such as timeconsuming, low efficiency and multivalue
attribute bias problem. We propose an improved ID3 algorithm based on Spearman Rank, which
optimizes the inadequacy of the attribute selection criterion. We overcome the main
drawback of the ID3 algorithm, which is to select attributes of more value by using
correlation coefficients, thus the accuracy of classification is improved. Meanwhile, we
reduce the time spent on log calculation by using related mathematical theories. Finally,
we conduct several experiments to verify the feasibility of the improved algorithm. Its
accuracy and efficiency are proved better than the ID3 algorithm.

Key words: decision tree, ID3 algorithm, information entropy, Spearman Rank, correlation coefficients