RIB-NER：基于跨度的中文命名实体识别模型

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (07): 1311-1320.

RIB-NER：基于跨度的中文命名实体识别模型

田红鹏,吴璟玮

(西安科技大学计算机科学与技术学院，陕西西安 710600)

收稿日期:2023-03-24 修回日期:2023-09-13 接受日期:2024-07-25 出版日期:2024-07-25 发布日期:2024-07-19

RIB-NER:A span-based Chinese named entity recognition model

TIAN Hong-peng,WU Jing-wei

(College of Computer Science and Technology,Xi’an University of Science and Technology,Xi’an 710600,China)

Received:2023-03-24 Revised:2023-09-13 Accepted:2024-07-25 Online:2024-07-25 Published:2024-07-19

摘要/Abstract

摘要： 命名实体识别是自然语言处理领域中诸多下游任务的重要基础。汉语作为重要的国际语言，在许多方面具有独特性。传统上，中文命名实体识别任务模型使用序列标记机制，该机制需要条件随机场捕获标签的依赖性，然而，这种方法容易出现标签的错误分类。针对这个问题，提出基于跨度的命名实体识别模型RIB-NER。首先，以RoBERTa-wwm-ext作为模型嵌入层，提供字符级嵌入，以获得更多的上下文语义信息和词汇信息。其次，利用IDCNN的并行卷积核来增强词之间的位置信息，从而使词与词之间联系更加紧密。同时，在模型中融合BiLSTM网络来获取上下文信息。最后，采用双仿射模型对句子中的开始标记和结束标记评分，使用这些标记探索跨度。在MSRA和Weibo 2个语料库上的实验结果表明，RIB-NER能够较为准确地识别实体边界，并分别获得了95.11%和73.94%的F1值。与传统深度学习相比，有更好的识别效果。

关键词: 中文命名实体识别, 双仿射模型, 迭代膨胀卷积神经网络, 预训练模型, 跨度

Abstract: Named entity recognition serves as an important foundation for many downstream tasks in the field of natural language processing. As an important international language, Chinese is unique in many aspects. Traditionally, models of Chinese named entity recognition tasks use sequence labeling mechanisms that require conditional random fields to capture label dependencies. However, this approach is prone to misclassification of labels. Aiming at this problem, a span-based named entity recognition model called RIB-NER is proposed. Firstly, the method provides character-level embedding through RoBERTa as a model embedding layer to obtain more contextual semantic and lexical information. Secondly, IDCNN is used to increase the position information between words with parallel convolution kernels, so that the connection between words is closer. At the same time, a BiLSTM network is integrated in the model to obtain context information. Finally, a Biaffine model is employed to score the start and end tokens in the sentence, and these tokens are used to explore spans. The proposed algorithm is tested on MSRA and Weibo corpora, the results show that it can accurately identify entity boundaries, achieving F1 scores of 95.11% and 73.94% respectively. Compared with traditional deep learning approaches, it demonstrates better recognition performance.

Key words: Chinese named entity recognition, biaffine model, iterated dilated convolutional neural network, pre-training model, span ,

田红鹏, 吴璟玮. RIB-NER：基于跨度的中文命名实体识别模型[J]. 计算机工程与科学, 2024, 46(07): 1311-1320.

TIAN Hong-peng, WU Jing-wei. RIB-NER:A span-based Chinese named entity recognition model[J]. Computer Engineering & Science, 2024, 46(07): 1311-1320.

编辑推荐

Metrics

阅读次数

全文

576

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	576

来源	本网站	其他网站

次数	425	151
比例	74%	26%

摘要

344

最新录用	在线预览	正式出版

0	0	344

	来源	本网站

	次数	344
	比例	100%

[1]	陈欣然, 刘宁, 闫中敏, 刘磊, 崔立真. 基于注意力指导的双粒度跨模态医学特征学习框架[J]. 计算机工程与科学, 2025, 47(01): 150-159.
[2]	徐捷, 邵玉斌, 杜庆治, 龙华, 马迪南. 结合混合特征提取与深度学习的长文本语义相似度计算[J]. 计算机工程与科学, 2024, 46(08): 1513-1520.
[3]	佟缘, 姚念民. 基于对span的预判断和多轮分类的实体关系抽取[J]. 计算机工程与科学, 2024, 46(05): 916-928.
[4]	范林雨, 李军辉, 孔芳. 基于无监督预训练的跨语言AMR解析[J]. 计算机工程与科学, 2024, 46(01): 170-178.