• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (05): 916-928.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于对span的预判断和多轮分类的实体关系抽取

佟缘,姚念民   

  1. (大连理工大学计算机科学与技术学院,辽宁 大连 116024)

  • 收稿日期:2023-02-06 修回日期:2023-04-19 接受日期:2024-05-25 出版日期:2024-05-25 发布日期:2024-05-30

Entity relation extraction based on prejudgment and multi-round classification for span

TONG Yuan,YAO Nian-min   

  1. (School of Computer Science and Technology,Dalian University of Technology,Dalian 116024,China)
  • Received:2023-02-06 Revised:2023-04-19 Accepted:2024-05-25 Online:2024-05-25 Published:2024-05-30

摘要: 针对自然语言处理领域中的实体识别和关系抽取任务,提出一种对词元序列(Token Sequence,又称span)进行预测的模型Smrc。模型整体上利用BERT预训练模型作为编码器,另外包含实体预判断(Pej)、实体多轮分类(Emr)和关系多轮分类(Rmr)3个模块。Smrc模型通过Pej模块的初步判断及Emr模块的多轮实体分类来进行实体识别,再利用Rmr模块的多轮关系分类来判断实体对间的关系,进而完成关系抽取任务。在CoNLL04、SciERC和ADE 3个实验数据集上,Smrc模型的实体识别F1值分别达到89.67%,70.62%和89.56%,关系抽取F1值分别达到73.11%,51.03%和79.89%,相较之前在3个数据集上的最佳模型Spert,Smrc模型凭借实体预判断和实体及关系多轮分类,在2个子任务上其F1值分别提高了0.73%,0.29%,0.61%及1.64%,0.19%,1.05%,表明了该模型的有效性及其优势。

关键词: 对span的预判断, 实体关系抽取, BERT预训练模型, 多轮实体分类, 多轮关系分类

Abstract: Aiming at entity recognition and relation extraction tasks in natural language processing, a model named Smrc is proposed, which makes predictions at the token sequence (span) level. The model uses BERT pre-training model as an encoder and include three modules: entity pre-judgment (Pej), entity multi-round classification (Emr) and relation multi-round classification (Rmr). The Smrc model performs entity recognition through the preliminary judgment of the Pej module and the multi-round entity classification of the Emr module, and then uses the Rmr module’s multi-round relation classification to determine the relationships between entities, thus completing the relation extraction task. On the experimental datasets of CoNLL04, SciERC, and ADE, the F1 values of entity recognition reach 89.67%, 70.62%, and 89.56%, respectively, and the F1 values of relation extraction reach 73.11%, 51.03%, and 79.89%, respectively. Compared with the previous best model Spert on the three datasets, the Smrc model achieves improvements of 0.73%, 0.29%, and 0.61% in entity recognition and 1.64%, 0.19%, and 1.05% in relation extraction through entity pre-judgment and multi-round classification of entities and relations, which demonstrates the effectiveness and advantages of the model.

Key words: pre-judgment of span, entity relation extraction, BERT pretraining model, multi-round entity classification, multi-round relation classification