首页 /研究 /Kernel dynamic policy programming: Practical reinforcement learning for high-dimensional robots

LEARNING

Kernel dynamic policy programming: Practical reinforcement learning for high-dimensional robots

Yunduan Cui, Takamitsu Matsubara, Kenji Sugimoto

发表年份: 2016
引用次数: 7

摘要

Applying value function based reinforcement learning algorithms to real robots has been infeasible because the approximation of high-dimensional value function is difficult. The difficulty of such high-dimensional value function approximation in previous methods are twofold: 1) instability of value function approximation by non-smooth policy update and 2) computational complexity associated with high-dimensional state-action space. To cope with these issues, in this paper, we propose Kernel Dynamic Policy Programming (KDPP) that smoothly updates value function in an implicit high-dimensional feature space. The smooth policy update is promoted by adding the Kullback-Leibler divergence between current and updated policies in reward function as a regularization term to stabilize the value function approximation. The computational complexity is reduced by applying the kernel trick in the value function approximation. Therefore, KDPP can be interpreted as a novel yet practical extension of Dynamic Policy Programming (DPP) and kernelized value function-based reinforcement learning methods to combine the strengths of them. We successfully applied KDPP to learn unscrewing bottle cap in a Pneumatic Artificial Muscles (PAMs) driven humanoid robot hand, a system with 24 dimensional state space, with limited number of samples and commonplace computational resource.

关键词

Reinforcement learningBellman equationFunction approximationComputer scienceQ-learningDynamic programmingState spaceMathematical optimizationFunction (biology)Regularization (linguistics)

Kernel dynamic policy programming: Practical reinforcement learning for high-dimensional robots

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory