首页 /研究 /Efficient Exploration of Reward Functions in Inverse Reinforcement\n Learning via Bayesian Optimization
LEARNING

Efficient Exploration of Reward Functions in Inverse Reinforcement\n Learning via Bayesian Optimization

Sreejith Balakrishnan, Quoc Phong Nguyen, Bryan Kian Hsiang Low, Harold Soh

发表年份
2020
引用次数
7
访问权限
开放获取

摘要

The problem of inverse reinforcement learning (IRL) is relevant to a variety\nof tasks including value alignment and robot learning from demonstration.\nDespite significant algorithmic contributions in recent years, IRL remains an\nill-posed problem at its core; multiple reward functions coincide with the\nobserved behavior and the actual reward function is not identifiable without\nprior knowledge or supplementary information. This paper presents an IRL\nframework called Bayesian optimization-IRL (BO-IRL) which identifies multiple\nsolutions that are consistent with the expert demonstrations by efficiently\nexploring the reward function space. BO-IRL achieves this by utilizing Bayesian\nOptimization along with our newly proposed kernel that (a) projects the\nparameters of policy invariant reward functions to a single point in a latent\nspace and (b) ensures nearby points in the latent space correspond to reward\nfunctions yielding similar likelihoods. This projection allows the use of\nstandard stationary kernels in the latent space to capture the correlations\npresent across the reward function space. Empirical results on synthetic and\nreal-world environments (model-free and model-based) show that BO-IRL discovers\nmultiple reward functions while minimizing the number of expensive exact policy\noptimizations.\n

关键词

Bayesian optimizationReinforcement learningComputer scienceBayesian probabilityArtificial intelligenceMachine learningSpace (punctuation)Kernel (algebra)Function (biology)Bayesian inference

相关论文

查看 LEARNING 分类全部论文