Computer Engineering & Science >
A Study of the Question Classification Task in CommunityBased Q&A Services
Received date: 2009-12-21
Revised date: 2010-04-17
Online published: 2011-01-25
In Communitybased Q&A services(referred to as cQA) such as Baidu Zhidao, question classification is one of the crucial tasks and it is important to organize the questions submitted to the cQA system. The question categorization algorithm for the cQA service needs to get high accuracy, low computation and lowsensitivity to noise. Based on the kullbackLeibler distance classification algorithm, this paper introduces a new question classification approach adopting the idea of language model, named ngram KLD. The experimental results with a large corpus which contains more than 1 million questionanswer pairs show a significant improvement when the ngram KLD algorithm is used. And the ngram KLD algorithm is fit for the actual demand of the question classification task in the cQA service.
WANG Junze,HUANG Benxiong,HU Guang,WEN Jie . A Study of the Question Classification Task in CommunityBased Q&A Services[J]. Computer Engineering & Science, 2011 , 33(1) : 143 -149 . DOI: 10.3969/j.issn.1007130X.2011.
[1]Cao Y, Duan H, Lin CY, et al. Recommending Questions Using the MDLBased Tree Cut Model[C]∥Proc of the Int’l World Wide Web Conf,2008:8190.
[2]Jurczyk P, Agichtein E. Hits on Question Answer Portals: Exploration of Link Analysis for Author Ranking[C]∥Proc of the 30th Annual Int’l ACM SIGIR Conf Research and Development in Information Retrieval,2007:485846.
[3]Xue X, Jeon J,Croft W B. Retrieval Models for Question and Answer Archives[C]∥Proc of ACM SIGIR Conf Research and Development in Information Retrieval,2008:475482.
[4]Rocchio J J. Relevance Feedback in Information Retrieval[M].Prentice Hall, 1971.
[5]Yang Y, Liu X. A Reexamination of Text Categorization Methods[C]∥Proc of ACM SIGIR Conf Research and Development in Information Retrieval,1999:4249.
[6]Friedman N, Geiger D, Goldszmidt M. Bayesian Network Classifiers[J]. Machine Learning,1997, 29(23):131163.
[7]Joachims T. Text Categorization with Support Vector Machines:Learning with Many Relevant Features[C]∥Proc of the 10th European Conf on Machine Learning,1998:137142.
[8]Bigi B. Using KullbackLeibler Distance for Text Categorization[C]∥Proc of the 25th European Conf on IR Research,2003:305319.
[9]Manning C D, Schütze H. Foundations of Statistical Natural Language Processing[M]. Cambridge, Massachusetts: The MIT Press,1999.
[10]Lafferty J, Zhai C. Document Language Models, Query Models, and Risk Minimization for Information Retrieval[C]∥Proc of ACM SIGIR Conf Research and Development in Information Retrieval,2001:111119.
/
| 〈 |
|
〉 |