一种基于强化学习的多跳知识图谱推理方法研究

计算机工程与科学 ›› 2026, Vol. 48 ›› Issue (2): 256-267.

一种基于强化学习的多跳知识图谱推理方法研究

韩政，徐茹枝，刘晓华

（华北电力大学控制与计算机工程学院，北京 102206）

收稿日期:2024-04-26 修回日期:2024-10-27 出版日期:2026-02-25 发布日期:2026-03-10
基金资助:
国家自然科学基金（62372173）

A multi-hop knowledge graph reasoning method based on reinforcement learning

HAN Zheng，XU Ruzhi，LIU Xiaohua

（School of Control and Computer Engineering，North China Electric Power University，Beijing 102206，China）

Received:2024-04-26 Revised:2024-10-27 Online:2026-02-25 Published:2026-03-10

摘要/Abstract

摘要： 近年来，强化学习方法在知识推理任务中有着不错的表现，但面临着智能体容易进行无目的探索以及奖励的延迟与稀疏问题2个挑战。为此，提出了基于强化学习和预测信息嵌入的多跳知识推理模型，首先设计预测嵌入信息获取模块，将得到的预测信息嵌入到强化学习框架中，解决了智能体容易陷入无目的探索而选择无效动作的问题；然后在游走过程中加入结合预测信息和Dropout思想的动作剪枝机制，缓解了动作空间过大的问题，并使用LSTM来保存智能体的历史决策信息，使得智能体每一步都能选择最有希望的动作；最后根据预测信息设计新的奖励函数成功地缓解了延迟奖励和稀疏奖励的问题。在WebQSP，PQL和MetaQA数据集上的实验结果表明，该模型在知识推理任务上有着较高的性能，能够很好地适用于知识图谱的多跳问答。

关键词: 知识图谱, 强化学习, 知识推理

Abstract: In recent years, applying reinforcement learning to knowledge reasoning has shown promising performance, but it faces two key challenges: agents’ tendency to engage in aimless explorations and issues related to delayed and sparse rewards. To address these challenges, a multi-hop knowledge reasoning model based on reinforcement learning and predictive information embedding is proposed. Firstly, a predictive embedding information acquisition module is designed to incorporate the obtained predictive information into the reinforcement learning framework, resolving the issue of agents getting trapped in aimless exploration and selecting ineffective actions. Then, an action pruning mechanism combining predictive information with the Dropout concept is introduced during the traversal process to alleviate the problem of an excessively large action space. Additionally, LSTM is employed to retain the agent’s historical decision-making information, enabling the agent to select the most promising actions at each step. Finally, a new reward function based on predictive information successfully mitigates the issues of delayed and sparse rewards. Experimental results on the WebQSP, PQL, and MetaQA datasets demonstrate that the proposed model exhibits efficient performance in knowledge reasoning tasks and is well-suited for multi-hop question answering on knowledge graphs.

Key words: knowledge graph, reinforcement learning, knowledge reasoning

韩政, 徐茹枝, 刘晓华. 一种基于强化学习的多跳知识图谱推理方法研究[J]. 计算机工程与科学, 2026, 48(2): 256-267.

HAN Zheng, XU Ruzhi, LIU Xiaohua. A multi-hop knowledge graph reasoning method based on reinforcement learning[J]. Computer Engineering & Science, 2026, 48(2): 256-267.

[1]	高福财, 何廷年, 杨阳, 杨江伟. GPR:一种大语言模型增强的方法[J]. 计算机工程与科学, 2026, 48(1): 162-171.
[2]	陈子阳, 陈钧, 朱予涵, 刘耿耿, 黄兴. 面向安全可编程阀门阵列生物芯片的基于深度强化学习的组件布局算法#br#[J]. 计算机工程与科学, 2026, 48(1): 40-50.
[3]	徐建民, 仝思梦, 张国防. 基于知识图谱中多维元路径的科技文档查询扩展[J]. 计算机工程与科学, 2025, 47(8): 1493-1502.
[4]	陈俊彦1, 李欣梅1, 朱昌洪2, 肖微3. 基于多视图图注意力机制的软件定义光传输网络路由优化算法[J]. 计算机工程与科学, 2025, 47(7): 1193-1204.
[5]	李天云, 李韬, 温冬, 杨惠, 张毓涛, 罗欣, 董德尊. 基于人工智能方法的网络拥塞控制综述[J]. 计算机工程与科学, 2025, 47(6): 1018-1027.
[6]	邸剑, 万雪, 姜丽梅, . 基于随机对称搜索的进化强化学习算法[J]. 计算机工程与科学, 2025, 47(5): 912-920.
[7]	魏东, 贾宇辰, 韩少然. 数据中心制冷系统强化学习控制[J]. 计算机工程与科学, 2025, 47(3): 422-433.
[8]	李佳坤, 谢雨来, 冯丹. 云边协同框架下视频处理任务实时调度算法[J]. 计算机工程与科学, 2025, 47(10): 1767-1778.
[9]	余世瑞, 姜春茂. 基于模糊强化学习的云计算虚拟机调度策略[J]. 计算机工程与科学, 2025, 47(1): 56-65.
[10]	章政, 夏小云, 陈泽丰, 向毅. 融合强化学习的分阶段策略求解旅行背包问题[J]. 计算机工程与科学, 2025, 47(1): 140-149.
[11]	庄述鑫, 陈永红, 郝一行, 吴巍炜, 徐学永, 王万元. 对抗环境中基于种群多样性的鲁棒策略生成方法[J]. 计算机工程与科学, 2024, 46(6): 1081-1091.
[12]	段成龙, 袁杰, 常乾坤, 张宁宁. 基于D2GA的逆强化学习算法[J]. 计算机工程与科学, 2024, 46(11): 2053-2062.
[13]	蔡玉, 官铮, 王增文, 王学, 杨志军. 基于多智能体深度强化学习的车联网区分业务资源分配算法[J]. 计算机工程与科学, 2024, 46(10): 1757-1764.
[14]	顾颖程, 魏柳, 姜宁, 程环宇, 刘凯, 宋玉, 刘梅招, 汤雷, 陈彧, 张胜. 边缘场景下面向分布式交互应用的服务器分配[J]. 计算机工程与科学, 2024, 46(10): 1748-1756.
[15]	曾凡锋, 王春真, 李琛. 基于深浅层特征融合的无监督视频摘要算法研究[J]. 计算机工程与科学, 2023, 45(9): 1602-1610.