Efficient Exploration of Reward Functions in Inverse Reinforcement\n Learning via Bayesian Optimization
Sreejith Balakrishnan, Quoc Phong Nguyen, Bryan Kian Hsiang Low, Harold Soh
- 发表年份
- 2020
- 引用次数
- 7
- 访问权限
- 开放获取
摘要
The problem of inverse reinforcement learning (IRL) is relevant to a variety\nof tasks including value alignment and robot learning from demonstration.\nDespite significant algorithmic contributions in recent years, IRL remains an\nill-posed problem at its core; multiple reward functions coincide with the\nobserved behavior and the actual reward function is not identifiable without\nprior knowledge or supplementary information. This paper presents an IRL\nframework called Bayesian optimization-IRL (BO-IRL) which identifies multiple\nsolutions that are consistent with the expert demonstrations by efficiently\nexploring the reward function space. BO-IRL achieves this by utilizing Bayesian\nOptimization along with our newly proposed kernel that (a) projects the\nparameters of policy invariant reward functions to a single point in a latent\nspace and (b) ensures nearby points in the latent space correspond to reward\nfunctions yielding similar likelihoods. This projection allows the use of\nstandard stationary kernels in the latent space to capture the correlations\npresent across the reward function space. Empirical results on synthetic and\nreal-world environments (model-free and model-based) show that BO-IRL discovers\nmultiple reward functions while minimizing the number of expensive exact policy\noptimizations.\n
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002