• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2026, Vol. 48 ›› Issue (2): 256-267.

• 人工智能与数据挖掘 • 上一篇    下一篇

一种基于强化学习的多跳知识图谱推理方法研究

韩政,徐茹枝,刘晓华


  

  1. (华北电力大学控制与计算机工程学院,北京 102206)

  • 收稿日期:2024-04-26 修回日期:2024-10-27 出版日期:2026-02-25 发布日期:2026-03-10
  • 基金资助:
    国家自然科学基金(62372173)

A multi-hop knowledge graph reasoning method based on reinforcement learning

HAN Zheng,XU Ruzhi,LIU Xiaohua   

  1. (School of Control and Computer Engineering,North China Electric Power University,Beijing 102206,China)
  • Received:2024-04-26 Revised:2024-10-27 Online:2026-02-25 Published:2026-03-10

摘要: 近年来,强化学习方法在知识推理任务中有着不错的表现,但面临着智能体容易进行无目的探索以及奖励的延迟与稀疏问题2个挑战。为此,提出了基于强化学习和预测信息嵌入的多跳知识推理模型,首先设计预测嵌入信息获取模块,将得到的预测信息嵌入到强化学习框架中,解决了智能体容易陷入无目的探索而选择无效动作的问题;然后在游走过程中加入结合预测信息和Dropout思想的动作剪枝机制,缓解了动作空间过大的问题,并使用LSTM来保存智能体的历史决策信息,使得智能体每一步都能选择最有希望的动作;最后根据预测信息设计新的奖励函数成功地缓解了延迟奖励和稀疏奖励的问题。在WebQSP,PQL和MetaQA数据集上的实验结果表明,该模型在知识推理任务上有着较高的性能,能够很好地适用于知识图谱的多跳问答。


关键词: 知识图谱, 强化学习, 知识推理

Abstract: In recent years, applying reinforcement learning to knowledge reasoning has shown promising performance, but it faces two key challenges: agents’ tendency to engage in aimless explorations and issues related to delayed and sparse rewards. To address these challenges, a multi-hop knowledge reasoning model based on reinforcement learning and predictive information embedding is proposed. Firstly, a predictive embedding information acquisition module is designed to incorporate the obtained predictive information into the reinforcement learning framework, resolving the issue of agents getting trapped in aimless exploration and selecting ineffective actions. Then, an action pruning mechanism combining predictive information with the Dropout concept is introduced during the traversal process to alleviate the problem of an excessively large action space. Additionally, LSTM is employed to retain the agent’s historical decision-making information, enabling the agent to select the most promising actions at each step. Finally, a new reward function  based on predictive information successfully mitigates the issues of delayed and sparse rewards. Experimental results on the WebQSP, PQL, and MetaQA datasets demonstrate that the proposed  model exhibits efficient performance in knowledge reasoning tasks and is well-suited for multi-hop question answering on knowledge graphs.


Key words: knowledge graph, reinforcement learning, knowledge reasoning