• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2006, Vol. 28 ›› Issue (6): 135-139.

• 论文 • 上一篇    下一篇

基于混合统计模型的汉语命名实体识别方法

张晓艳 王挺 陈火旺   

  • 出版日期:2006-06-01 发布日期:2010-05-20

  • Online:2006-06-01 Published:2010-05-20

摘要:

本文针对三种重要的命名实体,即人名、地名、组织名,提出了一种隐马尔可夫模型(HMM)和最大熵模型(ME)相结合的汉语命名实体识别的方法.该方法的特点在于:使命名实体识别和词性标注两个任务一体化;融合两种统计模型进行命名实体识别,其中HMM从整体上(句子范围内)对命名实体识别进行约束,ME则在局部范围内(当前词的上下文范
范围)估计一个词串被标记为某种命名实体的概率.实验表明,这种方法能较好地识别上述三种命名实体.

关键词: 命名实体识别 隐马尔可夫模型 最大熵模型

Abstract:

This paper presents a method for Chinese Named Entity (NE) recognition using a mixed statistical model. Our NE recognition concentrates on three types of NEs personal names, location names and organization names. This method is characterized as the following two aspects. At first, it provides a unif ied framework tO incorporate NE recognition and Part-of-Speech lagging together. Secondly, it makes use of two statistical models, taking HMM to contrain the recogni tion in the scope of a sentence, taking ME to calculate the probability of the entity in the context. Experimental results show that the m ethod can effectively recognize the above-mentioned three named entities.

Key words: named entity recognition, Hidden Markov Model (HMM), maximum entropy model (ME)