首页 /研究 /Reinforcement learning of full-body humanoid motor skills
LEARNING

Reinforcement learning of full-body humanoid motor skills

Freek Stulp, Jonas Buchli, Evangelos A. Theodorou, Stefan Schaal

发表年份
2010
引用次数
51

摘要

Applying reinforcement learning to humanoid robots is challenging because humanoids have a large number of degrees of freedom and state and action spaces are continuous. Thus, most reinforcement learning algorithms would become computationally infeasible and require a prohibitive amount of trials to explore such high-dimensional spaces. In this paper, we present a probabilistic reinforcement learning approach, which is derived from the framework of stochastic optimal control and path integrals. The algorithm, called Policy Improvement with Path Integrals (PI <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ), has a surprisingly simple form, has no open tuning parameters besides the exploration noise, is model-free, and performs numerically robustly in high dimensional learning problems. We demonstrate how PI <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> is able to learn full-body motor skills on a 34-DOF humanoid robot. To demonstrate the generality of our approach, we also apply PI <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> in the context of variable impedance control, where both planned trajectories and gain schedules for each joint are optimized simultaneously.

关键词

Reinforcement learningHumanoid robotComputer scienceArtificial intelligenceContext (archaeology)Robot

相关论文

查看 LEARNING 分类全部论文