An Efficient Unified Approach Using Demonstrations for Inverse Reinforcement Learning
Maxwell Hwang, Wei‐Cheng Jiang, Yu-Jen Chen, Kao‐Shing Hwang, Yi-Chia Tseng
- 发表年份
- 2019
- 引用次数
- 5
摘要
A reinforcement learning (RF) agent is always equipped with a designed reward function to correct policies for optimal decision making through interactions with an environment. However, it is difficult to design a reward function appropriate for complex RF problems. To solve this difficulty, the inverse RF (IRL) is introduced to provide an efficient way to design a reward function based on input derived from knowledgeable experts. In the IRL, experts provide demonstrations so that the agents can imitate the behaviors accordingly. However, even incorrect demonstrations have merits, some of which are similar to correct ones, so as that the agents with these clues can endeavor to avoid the occurrence of that behavior. This article introduces an IRL method which considers two types of demonstrations, correct and incorrect, in function approximation of a reward function. Given the clues from two opposite demonstrations, agents can iteratively approximate a reward function that can guide them to like expert’s correct demonstrations and also, prevent them from making the same mistakes as the expert did. These incorrect demonstrations provide agents with some guidelines to avoid erroneous motions in the initial phase. Two simulated tasks, a labyrinth and robot soccer games are conducted to validate the proposed method. The simulation results show that the proposed method can achieve the objectives of generating an appropriate reward function to accomplish apprentice learning with an efficient learning time in IRL.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002