Interactive Reinforcement Learning from Demonstration and Human Evaluative Feedback
Guangliang Li, Bo He, Randy Gómez, Keisuke Nakamura
- Year
- 2018
- Citations
- 19
Abstract
Programing robots to perform tasks is difficult in the real world because of its richness and uncertainty. For robots and agents to be more useful, they must be able to learn quickly from ordinary people via natural interactions. In this paper, we investigate how an agent can learn from demonstration and positive and negative evaluative feedback provided by a human teacher. Specifically, we proposed a model-based method-IRL-TAMER-by combining learning from demonstration via inverse reinforcement learning (IRL) and learning from human reward via the TAMER framework. We tested our method in the Grid World domain and compared with the TAMER framework using different discount factors on human reward. Our results suggest that although an agent learning via IRL can learn a useful value function indicating which states are good based on the demonstration, it cannot obtain an effective policy navigating to the goal state with one demonstration. However, learning from demonstration can reduce the number of human reward needed to obtain an optimal policy, especially the number of negative feedback. That is to say, learning from demonstration can be a jump-start for agent's learning from human reward and reduce the number of mistakes-incorrect actions. Furthermore, our results show that learning from demonstration can only be useful for agent's learning from human reward when the discount factor is small, i.e., learning from myopic human reward.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Fractional Differential Equations
Igor Podlubný
2025
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991