AVID: Learning Multi-Stage Tasks via Pixel-Level Translation of Human\n Videos
Laura Smith, Nikita Dhawan, Marvin Zhang, Pieter Abbeel, Sergey Levine
- Year
- 2019
- Citations
- 2
- Access
- Open access
Abstract
Robotic reinforcement learning (RL) holds the promise of enabling robots to\nlearn complex behaviors through experience. However, realizing this promise for\nlong-horizon tasks in the real world requires mechanisms to reduce human burden\nin terms of defining the task and scaffolding the learning process. In this\npaper, we study how these challenges can be alleviated with an automated\nrobotic learning framework, in which multi-stage tasks are defined simply by\nproviding videos of a human demonstrator and then learned autonomously by the\nrobot from raw image observations. A central challenge in imitating human\nvideos is the difference in appearance between the human and robot, which\ntypically requires manual correspondence. We instead take an automated approach\nand perform pixel-level image translation via CycleGAN to convert the human\ndemonstration into a video of a robot, which can then be used to construct a\nreward function for a model-based RL algorithm. The robot then learns the task\none stage at a time, automatically learning how to reset each stage to retry it\nmultiple times without human-provided resets. This makes the learning process\nlargely automatic, from intuitive task specification via a video to automated\ntraining with minimal human intervention. We demonstrate that our approach is\ncapable of learning complex tasks, such as operating a coffee machine, directly\nfrom raw image observations, requiring only 20 minutes to provide human\ndemonstrations and about 180 minutes of robot interaction.\n
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002