Efficient reuse of previous experiences in humanoid motor learning

Norikazu Sugimoto, Voot Tangkaratt, Thijs Wensveen, Tingting Zhao, Masashi Sugiyama, Jun Morimoto

发表年份: 2014
引用次数: 4

摘要

In this study, we show that the motor control performance of a humanoid robot can be improved efficiently using its previous experiences in a Reinforcement Learning (RL) framework. RL is becoming a common approach to acquire a nonlinear optimal policy through trial and error. However, applying RL to real robot control is very difficult since it usually requires many learning trials. Such trials cannot be executed in real environments due to the limited durability of the real system. Therefore, in this study, instead of executing many learning trials, we use a recently developed RL algorithm called importance-weighted Policy Gradients with Parameter based Exploration (PGPE), with which the robot can efficiently reuse the previously sampled data to improve its policy parameters. We apply importance-weighted PGPE to CB-i, our real humanoid robot, and show that it can learn both target-reaching movement and cart-pole swing-up movements in a real environment within 10 minutes without any prior knowledge of the task or any carefully designed initial trajectory.

关键词

Humanoid robotReinforcement learningComputer scienceReuseTrajectoryTask (project management)RobotArtificial intelligenceControl (management)Machine learning

Efficient reuse of previous experiences in humanoid motor learning

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory