Parallel Reinforcement Learning Systems Using Exploration Agents and Dyna-Q Algorithm
Takeshi Tateyama, Seiichi Kawata, Y. Shimomura
- 发表年份
- 2007
- 引用次数
- 10
摘要
We propose a new strategy for parallel reinforcement learning; using this strategy, the optimal value function and policy can be constructed more quickly than by using traditional strategies. We define two types of agents: exploitation agents and exploration agents. The exploitation agents select actions mainly for the purpose of exploitation, and the exploration agents concentrate on exploration by using the extended κ-certainty exploration method. These agents learn in the same environment in parallel, combine each value function periodically and execute Dyna-Q. The use of this strategy, make it possible to expect the construction of the optimal value function , and enables the exploration agents to quickly select the optimal actions. The experimental results of the mobile robot simulation showed the applicability of our method.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002