Efficient Exploration of Reward Functions in Inverse Reinforcement\n Learning via Bayesian Optimization
Sreejith Balakrishnan, Quoc Phong Nguyen, Bryan Kian Hsiang Low, Harold Soh
- Year
- 2020
- Citations
- 7
- Access
- Open access
Abstract
The problem of inverse reinforcement learning (IRL) is relevant to a variety\nof tasks including value alignment and robot learning from demonstration.\nDespite significant algorithmic contributions in recent years, IRL remains an\nill-posed problem at its core; multiple reward functions coincide with the\nobserved behavior and the actual reward function is not identifiable without\nprior knowledge or supplementary information. This paper presents an IRL\nframework called Bayesian optimization-IRL (BO-IRL) which identifies multiple\nsolutions that are consistent with the expert demonstrations by efficiently\nexploring the reward function space. BO-IRL achieves this by utilizing Bayesian\nOptimization along with our newly proposed kernel that (a) projects the\nparameters of policy invariant reward functions to a single point in a latent\nspace and (b) ensures nearby points in the latent space correspond to reward\nfunctions yielding similar likelihoods. This projection allows the use of\nstandard stationary kernels in the latent space to capture the correlations\npresent across the reward function space. Empirical results on synthetic and\nreal-world environments (model-free and model-based) show that BO-IRL discovers\nmultiple reward functions while minimizing the number of expensive exact policy\noptimizations.\n
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002