• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    

基于有效实例的改进U树算法

宋佳佳,王作为   

  1. (天津工业大学计算机与软件学院,天津 300387)
  • 收稿日期:2017-09-12 修回日期:2018-03-12 出版日期:2019-01-25 发布日期:2019-01-25

A modified U-tree algorithm
based on effective instances
 

SONG Jiajia,WANG Zuowei   

  1. (School of Computer Science and Software Engineering,Tianjin Polytechnic University,Tianjin 300387,China)
  • Received:2017-09-12 Revised:2018-03-12 Online:2019-01-25 Published:2019-01-25

摘要:

传统U-Tree算法对于部分观测马尔可夫决策过程POMDP问题的解决已取得较为显著的成效,但是由于边缘节点生长过于随意,所以仍存在树的规模庞大、内存需求比较大、计算复杂度过高的问题。在原U-Tree算法的基础上,通过得到下一步观测值,来划分同一个叶子节点中做相同动作的实例,提出了一种基于有效实例来扩展边缘节点的EIU-Tree算法,大大缩减了计算规模,以此来帮助智能体更好更快地学习,并且在4×3经典栅格问题中做了仿真实验,对比于原有的U-Tree算法,该算法运行效果更好。
 
 

关键词: 部分观测马尔可夫决策过程, 强化学习, U-树, Q-学习算法

Abstract:

The traditional U-tree algorithm has achieved remarkable results in solving the problem of partially observable Markov decision process (POMDP), however, because of excessive random growth of fringe nodes, some problems such as large scale trees, large memory requirement and high computational complexity, still remain. Based on the original U-Tree algorithm, we classify the instances of the same leaf node which do the same action after obtaining the observation value, and propose an effective instance U-tree algorithm which extends fringe nodes based on effective instances. It greatly reduces computational scale to help the agent to learn faster and better. Simulation experiments are carried out on the classic 4×3 grid problem, and experimental results show that the algorithm outperforms the original u-Tree algorithm.

 

Key words: partially observable Markov decision process;reinforcement learning ;U-tree, Q-learning algorithm