• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    下一篇

区间值属性单调决策树算法的扩展

王鑫1,2,陈建凯1,2,翟俊海1,2   

  1. (1.河北大学数学与信息科学学院,河北 保定 071002;河北省机器学习与计算智能重点实验室,河北 保定 071002)

     
  • 收稿日期:2019-04-22 修回日期:2019-08-29 出版日期:2020-03-25 发布日期:2020-03-25
  • 基金资助:

    河北省科技计划重点研发基金(19210310D);河北省自然科学基金(F2017201026);河北省社会科学基金(HB18GL010,HB19JY042)

An extended monotonic decision
tree algorithm of interval-valued attributes
 

WANG Xin1,2,CHEN Jian-kai1,2,ZHAI Jun-hai1,2   

  1. (1.College of Mathematics and Information Science,Hebei University,Baoding 071002;
    2.Hebei Province Key Laboratory in Machine Learning and Computational Intelligence,Baoding 071002,China)
     
  • Received:2019-04-22 Revised:2019-08-29 Online:2020-03-25 Published:2020-03-25

摘要:

区间值属性单调决策树算法是处理区间值属性单调分类问题的重要途径之一,但此算法构建决策树过程中没有考虑属性间的相关性,因此极可能继续分类没有意义或意义很小的冗余属性。针对以上不足,在区间值属性单调决策树算法的基础上,分析了区间值属性之间的冗余信息对构建单调决策树的影响,并提出了一种扩展算法,要求选取的扩展属性不仅与决策属性的排序互信息值最大,还与同一分支上已被选取的条件属性的排序互信息值最小。实验结果表明,考虑了区间值属性间的交互信息后,可避免同一条件属性的重复选择,与已有的算法相比,该扩展算法能构建出更优的单调决策树。
 
 

关键词: 区间值属性, 排序互信息, 属性相关, 单调决策树

Abstract:

The monotonic decision tree algorithm of interval-valued attributes is one of the important ways to deal with the classification problems with monotonicity constraints. However, the correlation between attributes is not taken into account in the process of building a decision tree, so it is very possible that over-classification of redundant attributes has little or no significance. To solve these problems, based on the monotonic decision tree algorithm of interval-valued attributes, the paper analyzes the influence of redundant information between interval-valued attributes on the construction of monotonic decision tree, and proposes an extended monotonic decision tree algorithm of interval-valued attributes. The extended attributes are selected by maximizing the value of the rank mutual information between the candidate attributes and the decision attribute and minimizing the value of the rank mutual information between the candidate attributes and the selected attributes on the same branch. The experimental results show that the extended algorithm can avoid repeated selection of the same attributes after considering the correlation among the condition attributes. Compared with the existing algorithms, the extended algorithm can construct a better monotonic decision tree.

 

 

 

Key words: interval-valued attribute, rank mutual information, correlation of attributes, monotonic decision tree