Prioritized Sampling with Intrinsic Motivation in Multi-Task Reinforcement Learning
Carlo D’Eramo, Georgia Chalvatzaki
- 发表年份
- 2022
- 引用次数
- 3
摘要
Deep Reinforcement Learning (RL) promises to lead the next advances towards the development of coveted future intelligent agents. However, the unprecedented representational power of deep function approximators, e.g. deep neural networks, comes at the cost of demanding a huge amount of experience, making deep RL impractical for applications requiring interactions with the real world. We study the problem of making use of samples in deep RL more efficiently, exploiting the desirable properties of knowledge generalization resulting from learning multiple tasks together. The outcome of our work is the coupling of multi-task RL algorithms with a task-sampling policy based on the well-known intrinsic motivation paradigm. In particular, we leverage on the notion of TD-error of Bellman updates, as an effective measure of learning progress, to prioritize sampling from the tasks contributing the most to the learning of the agent. This sampling strategy speeds up the learning of tasks for which the agent is showing progress, and postpones the learning of the remaining ones, resulting in an optimized collection of samples. Our method is supported by experimental evaluations on well-known RL control tasks, for which our approach shows superior sample-efficiency and performance compared to representative baselines. We eventually evaluate our approach on simulated control tasks based on Quanser robotics systems, confirming the advantages over the baselines also in more realistic applications.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002