Home /Research /Learning from Suboptimal Demonstration via Self-Supervised Reward Regression.

LEARNING

Learning from Suboptimal Demonstration via Self-Supervised Reward Regression.

Letian Chen, Rohan Paleja, Matthew Gombolay

Year: 2020
Citations: 3

Abstract

Learning from Demonstration (LfD) seeks to democratize robotics by enabling non-roboticist end-users to teach robots to perform a task by providing a human demonstration. However, modern LfD techniques, such as inverse reinforcement learning (IRL), assume users provide at least stochastically optimal demonstrations. This assumption fails to hold in all but the most isolated, controlled scenarios, reducing the ability to achieve the goal of empowering real end-users. Recent attempts to learn from sub-optimal demonstration leverage pairwise rankings through Preference-based Reinforcement Learning (PbRL) to infer a more optimal policy than the demonstration. However, we show that these approaches make incorrect assumptions and, consequently, suffer from brittle, degraded performance. In this paper, we overcome the limitations of prior work by developing a novel computational technique that infers an idealized reward function from suboptimal demonstration and bootstraps suboptimal demonstrations to synthesize optimality-parameterized training data for training our reward function. We empirically validate we can learn an idealized reward function with $\sim0.95$ correlation with the ground truth reward versus only $\sim 0.75$ for prior work. We can then train policies achieving $\sim 200\%$ improvement over the suboptimal demonstration and $\sim 90\%$ improvement over prior work. Finally, we present a real-world implementation for teaching a robot to hit a topspin shot in table tennis better than user demonstration.

Keywords

Computer scienceLeverage (statistics)Reinforcement learningArtificial intelligenceMachine learningPairwise comparisonFunction (biology)Parameterized complexityRoboticsTask (project management)

Learning from Suboptimal Demonstration via Self-Supervised Reward Regression.

Abstract

Keywords

Related papers

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory