基于自注意力和Lattice-LSTM的军事命名实体识别

计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (10): 1848-1855.

基于自注意力和Lattice-LSTM的军事命名实体识别

李鸿飞1，2，刘盼雨3，魏勇2

(1.战略支援部队信息工程大学地理空间信息学院，河南郑州 450052；

2.31008部队，北京 100091；3.国防科技大学计算机学院，湖南长沙 410073)

收稿日期:2020-09-28 修回日期:2020-12-30 接受日期:2021-10-25 出版日期:2021-10-25 发布日期:2021-10-22
作者简介:李鸿飞 (1974),男，黑龙江哈尔滨人，博士，高级工程师，研究方向为数据工程。
基金资助:
国家自然科学基金(61922088)

Military named entity recognition based on self-attention and Lattice-LSTM

LI Hong-fei1,2,LIU Pan-yu3,WEI Yong2

（1.School of Geospatial Information,PLA SSF Information Engineering University,Zhengzhou 450052;

2.Troop 31008,Beijing 100091;

3.College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China）

Received:2020-09-28 Revised:2020-12-30 Accepted:2021-10-25 Online:2021-10-25 Published:2021-10-22
About author:LI Hong-fei ,born in 1974, PhD, se- nior engineer, his research interest includes data engineering.

摘要/Abstract

摘要： 军事命名实体识别能够为情报分析、指挥决策等环节提供自动化辅助支持，是提升指挥信息系统智能化程度的关键技术手段。由于中文文化和英文文化的不同，中国语言文字中实体识别第1步是对文章字句进行分词，分词的不准确则会直接造成命名实体识别上的精度损失。此外，一段字句中命名实体的识别是与上下文信息相关的，不同字词对实体识别的贡献度不一定是正向的，多余的字词信息只会对命名实体识别起到负面作用。针对上述挑战，提出了 Lattice长短时记忆神经网络 (LSTM) 结合自注意力机制（self-attention) 的融合网络模型。Lattice-LSTM 结构可以实现对字句中特殊字词的识别，并将深层的字词信息聚合到基于字符的 LSTM-CRF 模型中。Self-attention结构可以发现同一句子中词之间的关系特征或语义特征。使用人工标注的小规模样本集进行实验，结果表明该模型相较于几种基线模型取得了更理想的效果。

关键词: 命名实体识别, Lattice, 自注意力

Abstract: The identification of military named entities can provide automatic auxiliary support for intelligence analysis, command and decision-making, and is the key technical means to improve the intelligence of command information system. Because of the differences in Chinese and English language characteristics, Chinese entity recognition must first part the text, and word-breaks will lead to the accumulation of errors in the recognition of named entities. In addition, the identification of named entities in a piece of text may be related only to local information, and each word contributes differently to other entities, and too much redundant information can only negatively affect the identification of named entities. In response to the above problems, we propose a network model of Lattice-Long Memory Neural Network (LSTM) combined with self-attention mechanisms. The Lattice LSTM structure enables the identification of proper nouns in sentences and integrates potential word information into character-based LSTM-CRF models. Self-attention structures can capture syntactic or semantic features between words in the same sentence. Model experiments were conducted on a small sample set that we labeled ourselves, and the results show that our model achieves the desired effect.

Key words: named entity recognition, Lattice, group；self-attention

李鸿飞, 刘盼雨, 魏勇. 基于自注意力和Lattice-LSTM的军事命名实体识别[J]. 计算机工程与科学, 2021, 43(10): 1848-1855.

LI Hong-fei, LIU Pan-yu, WEI Yong. Military named entity recognition based on self-attention and Lattice-LSTM[J]. Computer Engineering & Science, 2021, 43(10): 1848-1855.

[1]	付燕, 杨旭, 叶鸥. 基于CNN和Transformer特征融合的烟雾识别方法[J]. 计算机工程与科学, 2024, 46(11): 2045-2052.
[2]	刘国岐, 何廷年, 荣艺煊, 李卓然. 基于用户轨迹和好友关系的兴趣点推荐[J]. 计算机工程与科学, 2024, 46(09): 1693-1701.
[3]	刘晓华, 徐茹枝, 杨成月. 一种基于多特征融合嵌入的中文命名实体识别模型研究[J]. 计算机工程与科学, 2024, 46(08): 1473-1481.
[4]	丁建平, 李卫军, 刘雪洋, 陈旭. 命名实体识别研究综述[J]. 计算机工程与科学, 2024, 46(07): 1296-1310.
[5]	田红鹏, 吴璟玮. RIB-NER：基于跨度的中文命名实体识别模型[J]. 计算机工程与科学, 2024, 46(07): 1311-1320.
[6]	马长林, 孙状. 基于实体知识的远程监督关系抽取[J]. 计算机工程与科学, 2024, 46(05): 945-950.
[7]	晋广印, 赵旭俊, 龚艺璇. 基于长短期记忆网络的移动轨迹目的地预测[J]. 计算机工程与科学, 2024, 46(03): 525-534.
[8]	吉旭瑞, 魏德健, 张俊忠, 张帅, 曹慧. 中文电子病历信息提取方法研究综述[J]. 计算机工程与科学, 2024, 46(02): 325-337.
[9]	余子丞, 凌捷. 基于Transformer和多特征融合的DGA域名检测方法[J]. 计算机工程与科学, 2023, 45(08): 1416-1423.
[10]	喻金平, 朱伟锋, 廖列法. 基于RoBERTa-wwm-BiLSTM-CRF的扶持政策文本实体识别研究[J]. 计算机工程与科学, 2023, 45(08): 1498-1507.
[11]	王剑, 姜林, 王琳钦, 余正涛, 张松, 高盛祥, . 基于BiLSTM的低资源老挝语文本正则化任务[J]. 计算机工程与科学, 2023, 45(07): 1292-1299.
[12]	李建红, 苏晓倩, 吴彩虹. 深度层次注意力矩阵分解[J]. 计算机工程与科学, 2023, 45(01): 28-36.
[13]	陈曦, 赵红东, 杨东旭, 徐柯南, 任星霖, 封慧杰. 基于线性注意力机制的单样本生成对抗网络研究[J]. 计算机工程与科学, 2022, 44(11): 2056-2063.
[14]	杨春霞, 姚思诚, 宋金剑, . 基于词共现的方面级情感分析模型[J]. 计算机工程与科学, 2022, 44(11): 2071-2079.
[15]	袁野, 廖薇. 基于多重相关信息交互的文本相似度计算方法[J]. 计算机工程与科学, 2022, 44(07): 1313-1320.