• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2025, Vol. 47 ›› Issue (1): 140-149.

• Artificial Intelligence and Data Mining • Previous Articles     Next Articles

A staged strategy incorporating reinforcement learning to solve the travelling thief problem

ZHANG Zheng1,2,XIA Xiaoyun2,CHEN Zefeng3,XIANG Yi4   

  1. (1.School of Computer Science and Technology,Zhejiang Sci-Tech University,Hangzhou 310018;
    2.School of Artificial Intelligence,Jiaxing University,Jiaxing 314001;
    3.School of Artificial Intelligence,Sun Yat-sen University,Zhuhai 519082;
    4.School of Software Engineering,South China University of Technology,Guangzhou 510006,China)n
  • Received:2023-08-23 Revised:2024-04-23 Online:2025-01-25 Published:2025-01-18

Abstract: The travelling thief problem (TTP) is a combination of the traditional traveling salesman problem(TSP) and the knapsack problem(KP), which is an NP-hard problem. Compared with the independent TSP and KP, the TTP is more realistic and has higher research value. Previous TTP solving algorithms are mainly heuristic algorithms with limited performance, and other types of algorithms are less studied. To acquire better solution for TTP, a staged strategy of incorporating reinforcement learning is proposed. The first stage generates an item selection plan based on the properties of items. The second stage uses a reinforcement learning algorithm (Actor-Critic algorithm) to solve the travel path. The third stage introduces neighborhood search strategy to optimize the obtained solution. Experiments show that the proposed algorithm achieves good results on most test cases and, in some cases, outperforms the compared algorithms in terms of solution quality, demonstrating the superior performance of the proposed algorithm. 

Key words: reinforcement learning, travelling thief problem (TTP), Actor-Critic algorithm, combinatorial optimizatio