Home /Research /Efficient Exploration of Reward Functions in Inverse Reinforcement\n Learning via Bayesian Optimization
LEARNING

Efficient Exploration of Reward Functions in Inverse Reinforcement\n Learning via Bayesian Optimization

Sreejith Balakrishnan, Quoc Phong Nguyen, Bryan Kian Hsiang Low, Harold Soh

Year
2020
Citations
7
Access
Open access

Abstract

The problem of inverse reinforcement learning (IRL) is relevant to a variety\nof tasks including value alignment and robot learning from demonstration.\nDespite significant algorithmic contributions in recent years, IRL remains an\nill-posed problem at its core; multiple reward functions coincide with the\nobserved behavior and the actual reward function is not identifiable without\nprior knowledge or supplementary information. This paper presents an IRL\nframework called Bayesian optimization-IRL (BO-IRL) which identifies multiple\nsolutions that are consistent with the expert demonstrations by efficiently\nexploring the reward function space. BO-IRL achieves this by utilizing Bayesian\nOptimization along with our newly proposed kernel that (a) projects the\nparameters of policy invariant reward functions to a single point in a latent\nspace and (b) ensures nearby points in the latent space correspond to reward\nfunctions yielding similar likelihoods. This projection allows the use of\nstandard stationary kernels in the latent space to capture the correlations\npresent across the reward function space. Empirical results on synthetic and\nreal-world environments (model-free and model-based) show that BO-IRL discovers\nmultiple reward functions while minimizing the number of expensive exact policy\noptimizations.\n

Keywords

Bayesian optimizationReinforcement learningComputer scienceBayesian probabilityArtificial intelligenceMachine learningSpace (punctuation)Kernel (algebra)Function (biology)Bayesian inference

Related papers

Browse all LEARNING papers