• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊
论文

基于查询扩展词条加权的文本检索研究

展开
  • (武警广州指挥学院信息技术教研室,广东 广州 510440)
张映海(1976),男,贵州仁怀人,硕士,讲师,研究方向为自然语言处理。张宇薇(1973),女,重庆人,硕士,讲师,研究方向为计算机辅助教育。

收稿日期: 2010-01-28

  修回日期: 2010-05-21

  网络出版日期: 2011-01-25

基金资助

国家自然科学基金资助项目(60173060)

Research on the Text Retrieval Based on the Weight of the Expanding Query Term

Expand
  • (Information Technology Office,
    Guangzhou Commanding Institute of the Armed Police of China,Guangzhou 510440,China)

Received date: 2010-01-28

  Revised date: 2010-05-21

  Online published: 2011-01-25

摘要

本文分析了关键词检索文本,由于其查询词没有扩展导致检全率低;而概念检索文本虽然部分有检索词扩展,但是查询词权重与原查询词没有区分。为此,本文利用词条间的语义相似度,提出一种查询扩展词条权重计算方法——展开减小法,并将查询词以及扩展词经展开减小法计算权重后构建向量空间模型检索文本。实验表明,构建的检索模型检索文本,其综合性能得到极大提高。

本文引用格式

张映海,张宇薇 . 基于查询扩展词条加权的文本检索研究[J]. 计算机工程与科学, 2011 , 33(1) : 161 -165 . DOI: 10.3969/j.issn.1007130X.2011.

Abstract

The text retrieval with keywords is analyzed, and the recall rate is low without the expanding query. And as for the text retrieval with concept, the weights between the expanding query terms and query words are not distinguished. So, a method named  expansion lessening, which is used for computing the weight of the expanding query term by the dint of words semantic similarity, is proposed. By expansion lessening, the expanding query term and query words are used to construct a  vector space model for  text search. Experiments show that the integrated performance of the constructed retrieval model is improved greatly in text search.

参考文献

[1]李晓黎,周长胜.基于相关反馈技术的Web检索改进研究与实现[J].航空计算技术,2004,34(3):129132.
[2]储荷婷,张晓林.Internet网络信息检索——原理 工具 技巧[M].北京:清华大学出版社,1999.
[3]袁占亭,张爱民,张余秋.基于概念的Web信息检索[J].计算机工程与应用,2003,39(36):173181.
[4]张敏,宋睿华,马少平.基于语义关系查询扩展的文档重构方法[J].计算机学报,2004,27(10):13951400.
[5]董振东,董强.“知网”[EB/OL].[20100110].http://www.keenage.com.
[6]Lin D,Pantel P.Concept Discovery from Text[C]∥Proc of Conf on Computational Linguistics,2002:577583.
[7]Cooper W S.Getting Beyond Boole[J]. Information Processing and Management,1988,24(3):225243.
[8]FAN Xiaozhong,LI Hongqiao,LI Liangfu.Hybrid Chinese Information Retrieval Model Based on the Combination of Keyword and Concept[J].Journal of Beijing institute of Technology,2003(12):120123.
[9]Salton G,Wong A,Yang C S.On the Specification of Term Values in Automatic Indexing[J]. Journal of Documentation,1973,29(4):351372.
[10]Hinrich.统计自然语言处理基础[M].苑春法,李庆中,王昀,等译.北京:电子工业出版社,2005.
[11]Salton G,Wong A,Yang C S. A Vector Model for Automatic Indexing[J]. Communications of the ACM,1975,18(11):613620.
[12]刘群,李素建.基于《知网》的词汇语义相似度计算[C]∥第三届中文词汇语义学研讨会论文集,2002:5976.

文章导航

/