• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2011, Vol. 33 ›› Issue (12): 130-135.

• 论文 • Previous Articles     Next Articles

Bad:A Balanced Discretization Algorithm Based on the Minimum Description Length

HUANG Dong   

  1. (School of Computer and Information Engineering,Yibin University,Yibin 644007,China)
  • Received:2011-06-18 Revised:2011-09-26 Online:2011-12-24 Published:2011-12-25

Abstract:

Discretization of continuous data is an important preprocess of classification methods in data mining. This paper presents a balanced discretization algorithm based on the minimum description length principle. It well measures the relationship between the discretized interval and classification errors by proposing a balanced discretization function based on the minimum description length. The approach proposes an effective heuristic discretization algorithm with the aim to find the optimal breakpoint sequence. The simulation results demonstrate that the proposed algorithm achieves more classification and learning ability on the C5.0 decision tree and the naive Bayesian classifier.

Key words: discretization;data mining;minimum description length(MDL);balanced function