Computer Engineering & Science >
A Probabilistic Topic Model Based Noise Processing Method for Text Classification
Received date: 2009-01-13
Revised date: 2009-05-18
Online published: 2010-06-25
The performance of text classification depends directly on the quality of training corpus.In practical applications,noise samples are unavoidable in the training corpus and thus influence the effect of the text classification approach.To this end,a novel probabilistic topic model based noise processing method is proposed for text classification.In our method,the noise samples are filtered according to the class entropy.Then the data is smoothed using the generative process of the topic model to further weaken the influence of noise samples,meanwhile the original size of the training corpus is kept.The experimental results of the real world data show that the method proposed is robust to the distribution of noise samples,and has a relative good performance on the data sets with a high noise ratio.
LIN Yanggang,CHEN Enhong . A Probabilistic Topic Model Based Noise Processing Method for Text Classification[J]. Computer Engineering & Science, 2010 , 32(7) : 89 -92 . DOI: 10.3969/j.issn.1007130X.2010.
/
| 〈 |
|
〉 |