• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2010, Vol. 32 ›› Issue (1): 136-140.doi: 10.3969/j.issn.1007130X.2010.

• 论文 • 上一篇    下一篇

中文问答系统中基于SLM的信息检索及其平滑技术研究

  

  1. (杭州电子科技大学计算机学院,浙江 杭州 310018)
  • 收稿日期:2008-08-02 修回日期:2008-11-18 发布日期:2010-01-18
  • 通讯作者: 钱如栏 E-mail:663783@163.com
  • 作者简介:钱如栏(1983-),女,浙江湖州人,硕士生,研究方向为计算机网络与信息系统;董云耀,副教授,研究方向为计算机网络和信息系统。

Research on SLMIR and Its Smoothing Techniques in Chinese QA Systems

  1. (Department of Computer Science,Hangzhou Dianzi University,Hangzhou 310018,China)
  • Received:2008-08-02 Revised:2008-11-18 Published:2010-01-18

摘要:

为适应中文问答系统中汉语语言的特点,本文对信息检索模块进行了深入分析,相对于传统的主流信息检索模型,找到了一种更有效的检索方法——基于SLM的语言模型的信息检索技术(SLMIR)。同时,研究了Ngram模型的参数N选取及其几种主要的数据平滑技术,并通过对各种数据平滑方法的实验对比,讨论了影响这些数据平滑方法性能的有关因素,如训练集规模等,最终给出了在不同情况下的最优选择方案。

关键词: 信息检索, 统计语言模型, Ngram, SLMIR, 平滑技术

Abstract:

In order to fit in with the Chinese language characteristics in the QA systems, this paper thoroughly analyzes the information retrieval model. After analyzing and comparing the traditional main IR models, we get a more efficiency IR method, which is SLMIR (an information retrieval method based on statistical language modeling). In addition, we study the best order number N in Ngram and its main data smoothing techniques, compare them by test results, and discusse the relevant factors which affect the data smoothing method,such as the scale of training. Finally, the best smoothing techniques in different conditions are given.

Key words: information retrieval;statistical language model;Ngram;SLMIR;smoothing technique

中图分类号: