基于类型注意力和GCN的远程监督关系抽取

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (02): 316-324.

基于类型注意力和GCN的远程监督关系抽取

张欢1,2,李卫疆1,2

(1.昆明理工大学信息工程与自动化学院，云南昆明 650500；
2.昆明理工大学云南省人工智能重点实验室，云南昆明 650500)

收稿日期:2022-05-06 修回日期:2022-08-10 接受日期:2024-02-25 出版日期:2024-02-25 发布日期:2024-02-24
基金资助:
国家自然科学基金(62066022)

Distant supervision relation extraction based on type attention and GCN

ZHANG Huan1,2,LI Wei-jiang1,2

(1.Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500;
2.Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,China)

Received:2022-05-06 Revised:2022-08-10 Accepted:2024-02-25 Online:2024-02-25 Published:2024-02-24

摘要/Abstract

摘要： 远程监督关系抽取通过自动对齐自然语言文本与知识库生成带有标签的训练数据集，解决样本人工标注的问题。目前的远程监督研究大多没有关注到长尾（long-tail）数据，因此远程监督得到的大多数句包中所含句子太少，不能真实全面地反映数据的情况。因此，提出基于位置-类型注意力机制和图卷积网络的远程监督关系抽取模型PG+PTATT。利用图卷积网络GCN聚合相似句包的隐含高阶特征，并对句包进行优化以此得到句包更丰富全面的特征信息；同时构建位置-类型注意力机制PTATT，以解决远程监督关系抽取中错误标签的问题。PTATT利用实体词与非实体词的位置关系以及类型关系进行建模，减少噪声词带来的影响。提出的模型在New York Times数据集上进行实验验证，实验结果表明提出的模型能够有效解决远程监督关系抽取中存在的问题；同时，能够有效提升关系抽取的正确率。

关键词: 远程监督, 关系抽取, 图卷积网络, 注意力机制, 类型关系, 句包

Abstract: Distant supervision relation extraction uses the automatic alignment of natural language texts and knowledge bases to generate labeled training datasets, solving the problem of manual sample labeling. In the current research, most distant supervision does not pay attention to the long-tail data, so most of the sentence bags obtained by distant supervision contain too few sentences. These sentence bags cannot truly and comprehensively express the data itself. This paper proposes a distant supervised relation extraction model (PG+PTATT) based on position-type attention mechanism and graph convolutional network. According to the similarity between sentence bags, Graph Convolutional Networks (GCN) aggregate the implicit high-level features of similar sentence bags to optimize the sentence bags and obtain more prosperous and more comprehensive feature information of the sentence bags. At the same time, an attention mechanism, Position-Type Attention (PTATT) is constructed, which can solve the problem of wrong labels in distant supervision relation extraction: using the position relationships between entity words and non-entity words and type relationships are modeled to reduce the impact of noisy words. The proposed model is experimentally verified on the dataset New York Times (NYT), and the experimental results show that the proposed model can effectively solve the problems existing in distant supervision relation extraction; and it can effectively improve the accuracy of relation extraction.

Key words: distant supervision, relation extraction, graph convolutional network, attention mechanism, type relationship, sentence bag

张欢, 李卫疆, . 基于类型注意力和GCN的远程监督关系抽取[J]. 计算机工程与科学, 2024, 46(02): 316-324.

ZHANG Huan, LI Wei-jiang, . Distant supervision relation extraction based on type attention and GCN[J]. Computer Engineering & Science, 2024, 46(02): 316-324.

编辑推荐

Metrics

阅读次数

全文

418

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	418

来源	本网站	其他网站

次数	353	65
比例	84%	16%

摘要

190

最新录用	在线预览	正式出版

0	0	190

	来源	本网站

	次数	190
	比例	100%

[1]	徐超, 阮荣耀, 陈勇, . 一种基于区块链的医疗数据审计方法[J]. 计算机工程与科学, 2025, 47(01): 95-106.
[2]	陈兆波, 张琳, 马晓轩. 改进注意力混合自动编码器视频异常检测研究[J]. 计算机工程与科学, 2025, 47(01): 130-139.
[3]	付燕, 杨旭, 叶鸥. 基于CNN和Transformer特征融合的烟雾识别方法[J]. 计算机工程与科学, 2024, 46(11): 2045-2052.
[4]	余佳妮, 胡朝霞, 蒋从锋. 一种基于多特征的日志事件异常检测方法研究[J]. 计算机工程与科学, 2024, 46(09): 1587-1597.
[5]	刘国岐, 何廷年, 荣艺煊, 李卓然. 基于用户轨迹和好友关系的兴趣点推荐[J]. 计算机工程与科学, 2024, 46(09): 1693-1701.
[6]	刘晓华, 徐茹枝, 杨成月. 一种基于多特征融合嵌入的中文命名实体识别模型研究[J]. 计算机工程与科学, 2024, 46(08): 1473-1481.
[7]	张永智, 何可人, 戈珏. 改进YOLOv7网络在低空遥感图像目标检测中的应用[J]. 计算机工程与科学, 2024, 46(07): 1269-1277.
[8]	王泽宇, 徐慧英, 朱信忠, 李琛, 刘子洋, 王子奕. 基于YOLOv8改进的密集行人检测算法：MER-YOLO[J]. 计算机工程与科学, 2024, 46(06): 1050-1062.
[9]	邓翔宇, 裴浩媛, 盛迎. 基于网络融合的改进MobileViT人脸表情识别[J]. 计算机工程与科学, 2024, 46(06): 1072-1080.
[10]	张玉莹, 朱广丽, 谈光璞, . 基于情感增强和语义依存的金融隐式情感分析模型[J]. 计算机工程与科学, 2024, 46(06): 1112-1120.
[11]	尹春勇, 赵峰. 基于双层注意力和深度自编码器的时间序列异常检测模型[J]. 计算机工程与科学, 2024, 46(05): 826-835.
[12]	赵金源, 贾迪. 改进YOLOv5的多人姿态估计修正算法[J]. 计算机工程与科学, 2024, 46(05): 852-860.
[13]	佟缘, 姚念民. 基于对span的预判断和多轮分类的实体关系抽取[J]. 计算机工程与科学, 2024, 46(05): 916-928.
[14]	马长林, 孙状. 基于实体知识的远程监督关系抽取[J]. 计算机工程与科学, 2024, 46(05): 945-950.
[15]	曹浩东, 汪海涛, 贺建峰. 融合序列局部信息的日期感知序列推荐算法[J]. 计算机工程与科学, 2024, 46(04): 734-742.