• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (04): 711-717.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于ALBERT预训练模型的事件抽取技术研究

杜洁,骆力明,孙众   

  1. (首都师范大学信息工程学院,北京 100048)
  • 收稿日期:2021-04-02 修回日期:2021-09-14 接受日期:2023-04-25 出版日期:2023-04-25 发布日期:2023-04-13
  • 基金资助:
    国家自然科学基金(61977048)

Event extraction technology based on ALBERT pre-trained model

DU Jie,LUO Li-ming,SUN Zhong   

  1. (College of Information Engineering,Capital Normal University,Beijing 100048,China)
  • Received:2021-04-02 Revised:2021-09-14 Accepted:2023-04-25 Online:2023-04-25 Published:2023-04-13

摘要: 信息抽取技术用于从非结构化文本数据中提取关注度较高的信息。事件抽取技术是信息抽取研究领域中具有挑战的研究方向。事件抽取的目的是从非结构化文本数据中抽取描述事件的关键元素,并以结构化的方式呈现。事件抽取被看作序列标注任务,首先采用ALBERT预训练模型学习特征,其次引入条件随机场CRF模型提高序列标注性能,最后完成事件类型以及事件要素的识别分类。在ACE2005标准语料库上的实验结果表明,与现有模型相比,ALBERT-CRF模型在触发词识别和分类任务上的召回率和F值均有所提高。

关键词: 事件抽取, 序列标注, ALBERT模型, 条件随机场模型

Abstract: Information extraction technology is used to extract the information with high attention from unstructured text data. Event extraction technology is a challenging research direction in the field of information extraction. The purpose of event extraction is to extract key elements describing events from unstructured text data and present them in a structured way. Event extraction is regarded as a sequence annotation task. Firstly, the ALBERT pre-trained model is used to learn the features. Then, conditional random field is introduced to improve the sequence annotation performance. Finally, the identification and classification of event types and event elements are completed. The experimental results on ACE2005 standard corpus show that, compared with the existing models, ALBERT-CRF model improves the recall rate and F-score in trigger word recognition and classification tasks.

Key words: event extraction, sequence labeling, ALBERT model, conditional random field model