• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2020, Vol. 42 ›› Issue (09): 1670-1679.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于深层语境词表示与自注意力的生物医学事件抽取

魏优1,2,刘茂福1,2,胡慧君1,2   

  1. (1.武汉科技大学计算机科学与技术学院,湖北 武汉 430065;

    2.智能信息处理与实时工业系统湖北省重点实验室,湖北 武汉 430065)

  • 收稿日期:2019-09-23 修回日期:2020-03-10 接受日期:2020-09-25 出版日期:2020-09-25 发布日期:2020-09-25
  • 基金资助:
    国家社会科学基金(11&ZD189);湖北省教育厅人文社会科学研究项目(17Y018)

Biomedical event extraction based on  deep contextual word representation and self-attention

WEI You1,2,LIU Mao-fu1,2,HU Hui-jun1,2   

  1. (1.School of Computer Science and Technology,Wuhan University of Science and Technology,Wuhan 430065;

    2.Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System,Wuhan 430065,China)

  • Received:2019-09-23 Revised:2020-03-10 Accepted:2020-09-25 Online:2020-09-25 Published:2020-09-25

摘要: 生物医学事件抽取是生物医学文本信息抽取中最重要的、也是最富有挑战性的任务之一,近年来得到了广泛关注。生物医学事件抽取中最重要的2个子任务为触发词识别和事件要素检测。已有的大部分方法将触发词识别作为分类任务,忽略了句子级标签信息。构建基于长短时记忆神经网络与条件随机场的序列标注模型用于触发词识别,分别将组合字符级词表示的静态预训练词向量和基于预训练语言模型的动态语境词表示作为模型输入;同时,针对事件要素检测任务,充分利用实体以及实体类型特征,提出基于自注意力的多分类模型。最终触发词识别F1值为81.65%,整体事件抽取F1值为6004%,实验结果表明提出的方法对于生物医学事件抽取是有效的。


关键词: 生物医学事件抽取, 序列标注, 语境词表示, 自注意力

Abstract: Biomedical event extraction is one of the most significant and challenging tasks in biome- dical text information extraction, which has attracted more attentions in recent years. The two most important subtasks in biomedical event extraction are trigger recognition and argument detection. Most of the preceding methods consider trigger recognition as a classification task but ignore the sentence-level tag information. Therefore, a sequence labeling model based on bidirectional long short-term memory (Bi-LSTM) and conditional random field (CRF) is constructed for trigger recognition, which separately uses the static pre-trained word embedding combined with character-level word representation and the dynamic contextual word representation based on the pre-trained language model as model inputs. Meanwhile, for the event argument detection task, a self-attention based multi-classification model is proposed to make full use of the entity and entity type features. The F1-scores of trigger recognition and overall event extraction are 81.65% and 60.04% respectively, and the experimental results show that the proposed method is effective for biomedical event extraction.


Key words: biomedical event extraction, sequence labeling, contextual word representation, self- attention