首页 /研究 /A heuristic Q-learning architecture for fully exploring a world and deriving an optimal policy by model-based planning
LEARNING

A heuristic Q-learning architecture for fully exploring a world and deriving an optimal policy by model-based planning

Gang Zhao, Shoji Tatsumi, Ruoying Sun

发表年份
2003
引用次数
2

摘要

For solving Markov decision processes with incomplete information on robot learning tasks, model-based algorithm makes effective use of gathered data, but usually requires extreme computation. Dyna-Q is an architecture that uses experiences to build a model and uses the model to adjust the policy simultaneously, however, it does not help an agent to explore an environment actively. In, this paper, we present an Exa-Q architecture which learns models and makes plans using learned models to help the reinforcement learning agent explore an environment actively and improve the reinforcement function estimate. As a result, the Exa-Q architecture can identify an environment fully and speed up the learning rate for deriving the optimal policy. Experimental results demonstrate that the proposed method is efficient.

关键词

Reinforcement learningComputer scienceArchitectureHeuristicMarkov decision processArtificial intelligenceQ-learningFunction (biology)Machine learningComputation

相关论文

查看 LEARNING 分类全部论文