• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2008, Vol. 30 ›› Issue (4): 20-22.

• 论文 • 上一篇    下一篇

层次式文本分类的Naive Bayes改进方法

张博锋[1] 苏金树[1] 徐昕[1,2]   

  • 出版日期:2008-04-01 发布日期:2010-05-19

  • Online:2008-04-01 Published:2010-05-19

摘要:

Naive Bayes方法在文本分类中的决策强烈依赖于主观选择的样本关于类别的分布。本文利用层次式分类的特点并引入概率条件改进Naive Bayes方法,使其在每个内部类别所属的子类局部数据中进行决策,缓解了全局数据分布对分类器的影响,部分克服了数据偏斜问题。实验表明,改进方法在层次式分类中的效果较Naive Bayes方法有显著提高

关键词: 文本分类 层次式分类 Naive Bayes 机器学习 数据偏斜

Abstract:

The text categorization method of Naive Bayes, which highly depends on the subjectively-selected sample distribution with respect to classes, is enhan ced by using the characteristics of hierarchical classification and introducing the conditional probability. The enhancement makes decisions in the loca l data which belong to the sub-classes of an internal class to lighten the influence of global data distribution, and partially overcomes the problem ofdata skewness. Experiments show the enhanced method improves the effectiveness of hierarchical categorization with the Naive Bayes method notably.

Key words: text categorization, hierarchical categorization, Naive Bayes, machine learning, data skewness