• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2007, Vol. 29 ›› Issue (11): 81-83.

• 论文 • 上一篇    下一篇

GIS中文查询语句的未登录词识别算法研究

吴振南[1] 熊皓[2] 徐爱萍[2]   

  • 出版日期:2007-11-01 发布日期:2010-05-30

  • Online:2007-11-01 Published:2010-05-30

摘要:

由于GIS中文查询语句的理解是为了构造查询语句,而不同的应用系统其语料库的内容和结构有其特殊性,并且语料库也不可能穷尽所有的查询语句用词,所以本文提出了基 于系统语料库的GIS中文查询语句的未登录词识别算法的研究。将识别出来的未登录词通过人机交互的方式加入语料库,使所输入的查询语句的分词有自动记忆的功能,测试
 结果显示,此算法正确有效,为GIS中文查询语句的正确理解奠定了基础。

关键词: GIS 语料库 分词 未登录词

Abstract:

The purpose to understand the GIS Chinese query sentences is to construct query statements. Different application systems have different corpus in con tents and sturctures. On the other hand,the corpus cannot include all the words used in qu.ery statements. In this paper we give an algorithm to identif  y unknown words based on the GIS corpus. The unknown words can be added to the corpus by man-machine conversation and will be memorized. Experimental results show that the algorithm is satisfied. It establishes a basis for computers to understand the GIS Chinese query statements accurately.

Key words:  (geographic information systerni corpus;word segrnentation, unknown word)