首页 /研究 /A heuristic Q-learning architecture for fully exploring a world and deriving an optimal policy by model-based planning

LEARNING

A heuristic Q-learning architecture for fully exploring a world and deriving an optimal policy by model-based planning

Gang Zhao, Shoji Tatsumi, Ruoying Sun

发表年份: 2003
引用次数: 2

摘要

For solving Markov decision processes with incomplete information on robot learning tasks, model-based algorithm makes effective use of gathered data, but usually requires extreme computation. Dyna-Q is an architecture that uses experiences to build a model and uses the model to adjust the policy simultaneously, however, it does not help an agent to explore an environment actively. In, this paper, we present an Exa-Q architecture which learns models and makes plans using learned models to help the reinforcement learning agent explore an environment actively and improve the reinforcement function estimate. As a result, the Exa-Q architecture can identify an environment fully and speed up the learning rate for deriving the optimal policy. Experimental results demonstrate that the proposed method is efficient.

关键词

Reinforcement learningComputer scienceArchitectureHeuristicMarkov decision processArtificial intelligenceQ-learningFunction (biology)Machine learningComputation

A heuristic Q-learning architecture for fully exploring a world and deriving an optimal policy by model-based planning

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory