• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    下一篇

基于地址语义理解的中文地址识别方法

李晓林1,张懿1,李霖2   

  1. (1.武汉工程大学智能机器人湖北省重点实验室,湖北 武汉 430205;
    2.武汉大学资源与环境科学学院,湖北 武汉 430079)
  • 收稿日期:2017-12-25 修回日期:2018-08-15 出版日期:2019-03-25 发布日期:2019-03-25
  • 基金资助:

    十三五国家重点研发计划课题(2017YFB0503701);国家863计划(2013AA12A202);测绘地理信息公益性行业科研专项(201412014);湖北省自然科学基金(2013CFA125)

A Chinese address recognition method
 based on address semantics
 

LI Xiaolin1,ZHANG Yi1,LI Lin2   

  1. (1.Hubei Key Laboratory of Intelligent Robot,Wuhan Institute of Technology,Wuhan 430205;
    2.School of Resource and Environmental Science,Wuhan University,Wuhan 430079,China)
     
  • Received:2017-12-25 Revised:2018-08-15 Online:2019-03-25 Published:2019-03-25

摘要:

互联网中中文地址文本蕴含着丰富的空间位置信息,为了更加有效地获取文本中的地址位置信息,提出一种基于地址语义理解的地址位置信息识别方法。通过对训练语料进行词频统计,制定地址要素特征字集合和字转移概率,构造特征字转移概率矩阵,并结合字符串最大联合概率算法,设计了一种不依赖地名词典和词性标注的地址识别方法。实验结果表明,该方法对地址要素特征字突出且存在歧义的中文地址的完全匹配率为76.85%,识别准确率为93.11%。最后,与机械匹配算法和基于经验构造转移概率矩阵的方法进行对比实验,实验结果表明了该方法的可用性和有效性。

关键词: 地址语义, 要素特征字, 转移概率, 无词典

Abstract:

There are a large number of Chinese address text in the Internet that contains rich spatial location information. In order to obtain the address location information in the text more effectively, we propose a Chinese address location information recognition method based on address semantics. According to the statistics of word frequency of the training corpus, we obtain a set of address feature words and word transition probability. Then, we construct a feature word transition probability matrix. Finally, combining with the string maximum joint probability algorithm, we put forward an address recognition method which does not depend on address dictionary and tagging of the part of speech. Experimental results show that the exact match rate of the method is 76.85% for ambiguous Chinese addresses with prominent feature words, and the recognition accuracy is 93.11%. Compared with the mechanical matching algorithm and the methods for constructing the transition probability matrix based on experience, experimental results verify the feasibility and effectiveness of the proposed method.

Key words: address semantics, feature character word, transfer probability, without dictionary