Safe and efficient imitation learning by clarification of experienced latent space
Hidehito Fujiishi, Taisuke Kobayashi, Kenji Sugimoto
- 发表年份
- 2021
- 引用次数
- 6
摘要
Behavioral cloning from observation (BCO) allows the robot to learn the policy without the expert's action information. However, it requires a few interactions with the environment to infer expert's action with risk of robot failures. In addition, BCO assumes that the inferred action is of accurate, causing wrong and inefficient updates of the policy. Both problems can be resolved by outlier detection whether the faced state is experienced or not. This paper addresses such outlier detection mechanisms using variational autoencoder (VAE) to improve safety and efficiency of the standard BCO. For the first safety problem, we suppose that the expert's demonstrations only visited the safe states, and then, VAE is learned by the expert's state data to detect inexperienced and dangerous scenes. For the second efficiency problem, another VAE is trained with the state data safely collected by the imitator's policy to detect the scenes where the inferred actions are not accurate. In handwriting robot experiments, the proposed mechanisms succeeded in improving the standard BCO in terms of both the safety (roughly 64%) and the efficiency (roughly 44%). The high versatility of the proposed mechanisms is verified from learning various alphabets.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002