• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊
论文

A Probabilistic Topic Model Based Noise Processing Method for Text Classification

Expand

  • (School of Computer Science and Technology,University of Science and Technology of China,Hefei 230027,China)

Received date: 2009-01-13

  Revised date: 2009-05-18

  Online published: 2010-06-25

Abstract

The performance of text classification depends directly on the quality of training corpus.In practical applications,noise samples are unavoidable in the training corpus and thus influence the effect of the text classification approach.To this end,a novel probabilistic topic model based noise processing method is proposed for text classification.In our method,the  noise samples are filtered according to the class entropy.Then the data is smoothed using the generative process of the topic model to further weaken the influence of noise samples,meanwhile the original size of the training corpus is kept.The experimental results of the real world data show that the method proposed is robust to the distribution of noise samples,and has a relative good performance on the data sets with a high noise ratio.

Cite this article

LIN Yanggang,CHEN Enhong . A Probabilistic Topic Model Based Noise Processing Method for Text Classification[J]. Computer Engineering & Science, 2010 , 32(7) : 89 -92 . DOI: 10.3969/j.issn.1007130X.2010.

Outlines

/