• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (02): 316-324.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于类型注意力和GCN的远程监督关系抽取

张欢1,2,李卫疆1,2   

  1. (1.昆明理工大学信息工程与自动化学院,云南 昆明 650500;
    2.昆明理工大学云南省人工智能重点实验室,云南 昆明 650500)
  • 收稿日期:2022-05-06 修回日期:2022-08-10 接受日期:2024-02-25 出版日期:2024-02-25 发布日期:2024-02-24
  • 基金资助:
    国家自然科学基金(62066022)

Distant supervision relation extraction based on type attention and GCN

ZHANG Huan1,2,LI Wei-jiang1,2   

  1. (1.Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500;
    2.Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,China)
  • Received:2022-05-06 Revised:2022-08-10 Accepted:2024-02-25 Online:2024-02-25 Published:2024-02-24

摘要: 远程监督关系抽取通过自动对齐自然语言文本与知识库生成带有标签的训练数据集,解决样本人工标注的问题。目前的远程监督研究大多没有关注到长尾(long-tail)数据,因此远程监督得到的大多数句包中所含句子太少,不能真实全面地反映数据的情况。因此,提出基于位置-类型注意力机制和图卷积网络的远程监督关系抽取模型PG+PTATT。利用图卷积网络GCN聚合相似句包的隐含高阶特征,并对句包进行优化以此得到句包更丰富全面的特征信息;同时构建位置-类型注意力机制PTATT,以解决远程监督关系抽取中错误标签的问题。PTATT利用实体词与非实体词的位置关系以及类型关系进行建模,减少噪声词带来的影响。提出的模型在New York Times数据集上进行实验验证,实验结果表明提出的模型能够有效解决远程监督关系抽取中存在的问题;同时,能够有效提升关系抽取的正确率。

关键词: 远程监督, 关系抽取, 图卷积网络, 注意力机制, 类型关系, 句包

Abstract: Distant supervision relation extraction uses the automatic alignment of natural language texts and knowledge bases to generate labeled training datasets, solving the problem of manual sample labeling. In the current research, most distant supervision does not pay attention to the long-tail data, so most of the sentence bags obtained by distant supervision contain too few sentences. These sentence bags cannot truly and comprehensively express the data itself. This paper proposes a distant supervised relation extraction model (PG+PTATT) based on position-type attention mechanism and graph convolutional network. According to the similarity between sentence bags, Graph Convolutional Networks (GCN) aggregate the implicit high-level features of similar sentence bags to optimize the sentence bags and obtain more prosperous and more comprehensive feature information of the sentence bags. At the same time, an attention mechanism, Position-Type Attention (PTATT) is constructed, which can solve the problem of wrong labels in distant supervision relation extraction: using the position relationships between entity words and non-entity words and type relationships are modeled to reduce the impact of noisy words. The proposed model is experimentally verified on the dataset New York Times (NYT), and the experimental results show that the proposed model can effectively solve the problems existing in distant supervision relation extraction; and it can effectively improve the accuracy of relation extraction.

Key words: distant supervision, relation extraction, graph convolutional network, attention mechanism, type relationship, sentence bag