• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2011, Vol. 33 ›› Issue (12): 130-135.

• 论文 • 上一篇    下一篇



  1. (宜宾学院计算机与信息工程学院,四川 宜宾 644007)
  • 收稿日期:2011-06-18 修回日期:2011-09-26 出版日期:2011-12-24 发布日期:2011-12-25

Bad:A Balanced Discretization Algorithm Based on the Minimum Description Length

HUANG Dong   

  1. (School of Computer and Information Engineering,Yibin University,Yibin 644007,China)
  • Received:2011-06-18 Revised:2011-09-26 Online:2011-12-24 Published:2011-12-25



关键词: 离散化, 数据挖掘, 最小描述长度, 均衡函数


Discretization of continuous data is an important preprocess of classification methods in data mining. This paper presents a balanced discretization algorithm based on the minimum description length principle. It well measures the relationship between the discretized interval and classification errors by proposing a balanced discretization function based on the minimum description length. The approach proposes an effective heuristic discretization algorithm with the aim to find the optimal breakpoint sequence. The simulation results demonstrate that the proposed algorithm achieves more classification and learning ability on the C5.0 decision tree and the naive Bayesian classifier.

Key words: discretization;data mining;minimum description length(MDL);balanced function