一种针对非平衡数据的贝叶斯分类算法
收稿日期: 2009-03-13
修回日期: 2009-08-26
网络出版日期: 2010-06-25
A New Bayesian Classification Algorithmfor NonBalance Datasets
Received date: 2009-03-13
Revised date: 2009-08-26
Online published: 2010-06-25
汪春亮1,2,伏玉琛2 . 一种针对非平衡数据的贝叶斯分类算法[J]. 计算机工程与科学, 2010 , 32(7) : 95 -98 . DOI: 10.3969/j.issn.1007130X.2010.
Based on the idea of semisupervised learning, a new Bayesian classifier model by using an improved EM (ExpectationMaximum) algorithm is proposed to classify and predict nonbalance data gathered from mobile communication networks. Firstly, a statistical analysis is performed to calculate the priori probabilities based on the actual data. By using these priori probabilities as the initial values of the Bayesian model, we can speed up the convergence process of the EM algorithm. Secondly, a classifier based on the Bayesian network is constructed to learn the category characteristics of the historic communication data by improving the EM (ExpectationMaximum) steps. Thirdly, by using this classifier, the label of the current data sample is predicted. The experimental results demonstrate that, the proposed method highly increases the prediction accuracy of the negative label, and gains better performance than the traditional statistical methods.
Key words: semisupervised learning;Bayes
/
| 〈 |
|
〉 |