• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (10): 1848-1855.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于自注意力和Lattice-LSTM的军事命名实体识别

李鸿飞1,2,刘盼雨3,魏勇2   

  1. (1.战略支援部队信息工程大学地理空间信息学院,河南 郑州 450052;

    2.31008部队,北京 100091;3.国防科技大学计算机学院,湖南 长沙 410073)

  • 收稿日期:2020-09-28 修回日期:2020-12-30 接受日期:2021-10-25 出版日期:2021-10-25 发布日期:2021-10-22
  • 作者简介:李鸿飞 (1974),男,黑龙江哈尔滨人,博士,高级工程师,研究方向为数据工程。
  • 基金资助:
    国家自然科学基金(61922088)

Military named entity recognition based on self-attention and Lattice-LSTM

LI Hong-fei1,2,LIU Pan-yu3,WEI Yong2   

  1. (1.School of Geospatial Information,PLA SSF Information Engineering University,Zhengzhou 450052;

    2.Troop 31008,Beijing 100091;

    3.College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)

  • Received:2020-09-28 Revised:2020-12-30 Accepted:2021-10-25 Online:2021-10-25 Published:2021-10-22
  • About author:LI Hong-fei ,born in 1974, PhD, se- nior engineer, his research interest includes data engineering.

摘要: 军事命名实体识别能够为情报分析、指挥决策等环节提供自动化辅助支持,是提升指挥信息系统智能化程度的关键技术手段。由于中文文化和英文文化的不同,中国语言文字中实体识别第1步是对文章字句进行分词,分词的不准确则会直接造成命名实体识别上的精度损失。此外,一段字句中命名实体的识别是与上下文信息相关的,不同字词对实体识别的贡献度不一定是正向的,多余的字词信息只会对命名实体识别起到负面作用。针对上述挑战,提出了 Lattice长短时记忆神经网络 (LSTM) 结合自注意力机制(self-attention) 的融合网络模型。Lattice-LSTM 结构可以实现对字句中特殊字词的识别,并将深层的字词信息聚合到基于字符的 LSTM-CRF 模型中。Self-attention结构可以发现同一句子中词之间的关系特征或语义特征。使用人工标注的小规模样本集进行实验,结果表明该模型相较于几种基线模型取得了更理想的效果。


关键词: 命名实体识别, Lattice, 自注意力

Abstract: The identification of military named entities can provide automatic auxiliary support for intelligence analysis, command and decision-making, and is the key technical means to improve the intelligence of command information system. Because of the differences in Chinese and English language characteristics, Chinese entity recognition must first part the text, and word-breaks will lead to the accumulation of errors in the recognition of named entities. In addition, the identification of named entities in a piece of text may be related only to local information, and each word contributes differently to other entities, and too much redundant information can only negatively affect the identification of named entities. In response to the above problems, we propose a network model of Lattice-Long Memory Neural Network (LSTM) combined with self-attention mechanisms. The Lattice LSTM structure enables the identification of proper nouns in sentences and integrates potential word information into character-based LSTM-CRF models. Self-attention structures can capture syntactic or semantic features between words in the same sentence. Model experiments were conducted on a small sample set that we labeled ourselves, and the results show that our model achieves the desired effect.


Key words: named entity recognition, Lattice, group;self-attention