Competitive-Cooperative-Concurrent Reinforcement Learning with Importance Sampling
Eiji Uchibe, Kenji Doya
- 发表年份
- 2004
- 引用次数
- 98
摘要
The speed and performance of learning depend on the complexity of the learner. A simple learner with few parameters and no internal states can quickly obtain a reactive policy, but its performance is limited. A learner with many parameters and internal states may finally achieve high performance, but it may take enormous time for learning. Therefore, it is difficult to decide in advance which architecture and algorithm should be used for a new task. In this paper, we propose a new framework for selecting an appropriate policy out of a set of heterogeneous reinforcement learning modules and for correctly improving the policies of all learning modules including those not selected, using the method of importance sampling. In this framework, multiple heterogeneous learning modules sharing the same sensory-motor system can compete to act and cooperate to learn, allowing the overall learning system to obtain a good performance faster. We show in a simulation of partially-observable pole balancing task and robotic experiments of battery-pack foraging and partially observable T-maze tasks that a complex learning module trained with the proposed method can actually learn faster than when it is trained alone, by exploiting task-relevant episodes generated by suboptimal, but fast-learning modules.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002