Computer Engineering & Science
Previous Articles Next Articles
TANG Li,HE Li
Received:
Revised:
Online:
Published:
Abstract:
We propose an assessment index system and a method based on the PAC-Bayes theory for better data quality assessment of Web articles. Making full use of prior information of samples, the PAC-Bayes theory integrates the theories of Probably Approximately Correct and the Bayesian paradigm, and derives the tightest generalization bounds to assess the generalization capability of classifiers. We analyze the research status of data quality assessment of articles in detail, and then introduce the theoretical framework of the PAC-Bayes theory and its application for SVM. Furthermore, we propose a method for data quality assessment of Web articles based on the PAC-Bayes theory (DQAPB), and apply the SVM algorithm and its PAC-Bayes bound to the data quality assessment of Web articles. Moreover, we establish a quality assessment index system of Web articles based on the PAC-Bayes theory. Experiments on Wikipedia document show that the proposed method is simple and fast with strong stability and robustness.
Key words: PAC-Bayes bound, support vector machine (SVM), generalization capability, data quality assessment
TANG Li,HE Li.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://joces.nudt.edu.cn/EN/
http://joces.nudt.edu.cn/EN/Y2017/V39/I3/572