Region enhanced neural Q-learning for solving model-based POMDPs

Marco Wiering, Thijs Kooi

发表年份: 2010
引用次数: 3

摘要

To get a robot to perform tasks autonomously, the robot has to plan its behavior and make decisions based on the input it receives. Unfortunately, contemporary robot sensors and actuators are subject to noise, rendering optimal decision making a stochastic process. To model this process, partially observable Markov decision processes (POMDPs) can be applied. In this paper we introduce the RENQ algorithm, a new POMDP algorithm that combines neural networks for estimating Q-values with the construction of a spatial pyramid over the state space. RENQ essentially uses region-based belief vectors together with state-based belief vectors, and these are used as inputs to the neural network trained with Q-learning. We compare RENQ to Qmdp and Perseus, two state-of-the-art algorithms for approximately solving model-based POMDPs. The results on three different maze navigation tasks indicate that RENQ outperforms Perseus on all problems and Qmdp if the problem becomes larger.

关键词

Partially observable Markov decision processComputer scienceMarkov decision processArtificial intelligenceRendering (computer graphics)RobotArtificial neural networkState (computer science)ObservableMarkov process

Region enhanced neural Q-learning for solving model-based POMDPs

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory