• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2020, Vol. 42 ›› Issue (09): 1697-1703.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于强化学习的多样性文档排序算法

官蕊1,2,丁家满1,2,贾连印1,2,游进国1,2,姜瑛1,2   

  1. (1.昆明理工大学信息工程与自动化学院,云南 昆明 650500;2.云南省人工智能重点实验室,云南 昆明 650500)

  • 收稿日期:2019-10-10 修回日期:2020-03-24 接受日期:2020-09-25 出版日期:2020-09-25 发布日期:2020-09-25
  • 基金资助:
    国家自然科学基金(61562054)

A diversity document ranking algorithm based on reinforcement learning

GUAN Rui1,2,DING Jia-man1,2,JIA Lian-yin1,2,YOU Jin-guo1,2,JIANG Ying1,2   

  1. (1.Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500;

    2.Artificial Intelligence Key Laboratory of Yunnan Province,Kunming 650500,China)

  • Received:2019-10-10 Revised:2020-03-24 Accepted:2020-09-25 Online:2020-09-25 Published:2020-09-25

摘要: 在排序学习方法中,通过直接优化信息检索评价指标来学习排序模型的方法,取得了很好的排序效果,但是其损失函数在利用所有排序位置信息以及融合多样性排序因素方面还有待提高。为此,提出基于强化学习的多样性文档排序算法。首先,将强化学习思想应用于文档排序问题,通过将排序行为建模为马尔可夫决策过程,在每一次迭代过程中利用所有排序位置的信息,不断为每个排序位置选择最优的文档。其次,在排序过程中结合多样性策略,依据相似度阈值,裁剪高度相似的文档,从而保证排序结果的多样性。最后,在公共数据集上的实验结果表明,提出的算法在保证排序准确性的同时,增强了排序结果的多样性。

关键词: 强化学习, 排序学习, 马尔可夫决策过程, 多样性, 策略梯度

Abstract: In learning to rank methods, the method of learning the ranking model by directly optimizing the information retrieval evaluation indexes achieves good ranking effect, but its loss function still needs to be improved in using all ranking location information and fusing diversity ranking factors. Therefore, a diversity document ranking algorithm based on reinforcement learning is proposed. Firstly, the idea of reinforcement learning is applied to the ranking problem. By modeling the ranking behavior as a Markov decision process, the information of all ranking positions is used in each iteration to contin- uously select the optimal document for each ranking position. Secondly, the diversity strategy is used in the ranking process to cut highly similar documents according to the similarity threshold to ensure the diversity of the ranking results. Finally, the experimental results on the public dataset show that the proposed algorithm enhances the diversity of the ranking results while ensuring the ranking accuracy.


Key words: reinforcement learning, learning to rank, Markov decision process, diversity, policy gra- dient