Competitive-Cooperative-Concurrent Reinforcement Learning with Importance Sampling
Eiji Uchibe, Kenji Doya
- Year
- 2004
- Citations
- 98
Abstract
The speed and performance of learning depend on the complexity of the learner. A simple learner with few parameters and no internal states can quickly obtain a reactive policy, but its performance is limited. A learner with many parameters and internal states may finally achieve high performance, but it may take enormous time for learning. Therefore, it is difficult to decide in advance which architecture and algorithm should be used for a new task. In this paper, we propose a new framework for selecting an appropriate policy out of a set of heterogeneous reinforcement learning modules and for correctly improving the policies of all learning modules including those not selected, using the method of importance sampling. In this framework, multiple heterogeneous learning modules sharing the same sensory-motor system can compete to act and cooperate to learn, allowing the overall learning system to obtain a good performance faster. We show in a simulation of partially-observable pole balancing task and robotic experiments of battery-pack foraging and partially observable T-maze tasks that a complex learning module trained with the proposed method can actually learn faster than when it is trained alone, by exploiting task-relevant episodes generated by suboptimal, but fast-learning modules.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002