A heuristic Q-learning architecture for fully exploring a world and deriving an optimal policy by model-based planning
Gang Zhao, Shoji Tatsumi, Ruoying Sun
- Year
- 2003
- Citations
- 2
Abstract
For solving Markov decision processes with incomplete information on robot learning tasks, model-based algorithm makes effective use of gathered data, but usually requires extreme computation. Dyna-Q is an architecture that uses experiences to build a model and uses the model to adjust the policy simultaneously, however, it does not help an agent to explore an environment actively. In, this paper, we present an Exa-Q architecture which learns models and makes plans using learned models to help the reinforcement learning agent explore an environment actively and improve the reinforcement function estimate. As a result, the Exa-Q architecture can identify an environment fully and speed up the learning rate for deriving the optimal policy. Experimental results demonstrate that the proposed method is efficient.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002