• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (08): 1473-1481.

• 人工智能与数据挖掘 • 上一篇    下一篇

一种基于多特征融合嵌入的中文命名实体识别模型研究

刘晓华1,徐茹枝1,杨成月2   

  1. (1.华北电力大学控制与计算机工程学院,北京 102206;2.国家电网有限公司大数据中心,北京 100052)
  • 收稿日期:2023-03-31 修回日期:2023-06-14 接受日期:2024-08-25 出版日期:2024-08-25 发布日期:2024-09-02
  • 基金资助:
    国家自然科学基金(61972148)

A Chinese named entity recognition model based on multi-feature fusion embedding#br#

LIU Xiao-hua1,XU Ru-zhi1,YANG Cheng-yue2#br#   

  1. (1.School of Control and Computer Engineering,North China Electric Power University,Beijing 102206;
    2.Big Data Center of State Grid Corporation of China,Beijing 100052,China)
  • Received:2023-03-31 Revised:2023-06-14 Accepted:2024-08-25 Online:2024-08-25 Published:2024-09-02

摘要: 为解决中文字形上存在差异以及中文词语边界模糊的问题,提出了一种多特征融合嵌入的中文命名实体识别模型。在提取语义特征的基础上,基于卷积神经网络和多头自注意力机制捕获字形特征,并参考词语向量嵌入表获取词语特征,同时利用双向长短期记忆神经网络学习长距离的上下文表示,最后结合条件随机场学习句子序列标签中的约束条件,实现中文命名实体识别。在Resume、Weibo和People Daily数据集上的F1值分别达到了96.66%,70.84%和96.15%,证明提出的模型有效地提高了中文命名实体识别任务的性能。

关键词: 命名实体识别, 特征融合, 多头自注意力机制

Abstract:

In order to solve the problems of differences in Chinese glyphs and blurred boundaries of Chinese words, a Chinese named entity recognition model based on multi-feature fusion embedding is proposed. On the basis of extracting semantic features, glyph features are captured based on convolutional neural network and multi-headed self-attention mechanism, word features are obtained with reference to the word vector embedding table, and the bidirectional long short-term memory neural network is used to learn the context representation of long distance. Finally the constraint conditions in sentence sequence labels are learned by combining the conditional random field to realize Chinese named entity recognition. The F1 values on the Resume, Weibo and People Daily datasets reach 96.66%, 70.84% and 96.15%, respectively, which proves that the proposed model effectively improves the performance of Chinese named entity recognition tasks.


Key words: named entity recognition, feature fusion, multi-headed self-attention mechanism