• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊
论文

A Study of the Question Classification Task in CommunityBased Q&A Services

Expand
  • (Department of Electronics and Information Engineering,
    Huazhong University of Science and Technology,Wuhan 430074,China)

Received date: 2009-12-21

  Revised date: 2010-04-17

  Online published: 2011-01-25

Abstract

In Communitybased Q&A services(referred to as cQA) such as Baidu Zhidao, question classification is one of the crucial tasks and it is important to organize the questions submitted to the cQA system. The question categorization algorithm for the cQA service needs to get high accuracy, low computation and lowsensitivity to noise. Based on the kullbackLeibler distance classification algorithm, this paper introduces a new question classification approach adopting the idea of language model, named ngram KLD. The experimental results with a large corpus which contains more than 1 million questionanswer pairs show a significant improvement when the ngram KLD algorithm is used. And the ngram KLD algorithm is fit for the actual demand of the question classification task in the cQA service.

Cite this article

WANG Junze,HUANG Benxiong,HU Guang,WEN Jie . A Study of the Question Classification Task in CommunityBased Q&A Services[J]. Computer Engineering & Science, 2011 , 33(1) : 143 -149 . DOI: 10.3969/j.issn.1007130X.2011.

References

[1]Cao Y, Duan H, Lin CY, et al. Recommending Questions Using the MDLBased Tree Cut Model[C]∥Proc of the Int’l World Wide Web Conf,2008:8190.
[2]Jurczyk P, Agichtein E. Hits on Question Answer Portals: Exploration of Link Analysis for Author Ranking[C]∥Proc of  the 30th Annual Int’l ACM SIGIR Conf Research and Development in Information Retrieval,2007:485846.
[3]Xue X, Jeon J,Croft W B. Retrieval Models for Question and Answer Archives[C]∥Proc of ACM SIGIR Conf Research and Development in Information Retrieval,2008:475482.
[4]Rocchio J J. Relevance Feedback in Information Retrieval[M].Prentice Hall, 1971.
[5]Yang Y, Liu X. A Reexamination of Text Categorization Methods[C]∥Proc of ACM SIGIR Conf Research and Development in Information Retrieval,1999:4249.
[6]Friedman N, Geiger D, Goldszmidt M. Bayesian Network Classifiers[J]. Machine Learning,1997, 29(23):131163.
[7]Joachims T. Text Categorization with Support Vector Machines:Learning with Many Relevant Features[C]∥Proc of the 10th European Conf on Machine Learning,1998:137142.
[8]Bigi B. Using KullbackLeibler Distance for Text Categorization[C]∥Proc of the 25th European Conf on IR Research,2003:305319.
[9]Manning C D, Schütze H. Foundations of Statistical Natural Language Processing[M]. Cambridge, Massachusetts: The MIT Press,1999.
[10]Lafferty J, Zhai C. Document Language Models, Query Models, and Risk Minimization for Information Retrieval[C]∥Proc of ACM SIGIR Conf Research and Development in Information Retrieval,2001:111119.

Outlines

/