Learning Environmental Calibration Actions for Policy Self-Evolution

Chao Zhang, Yang Yu, Zhi‐Hua Zhou

发表年份: 2018
引用次数: 13
访问权限: 开放获取

摘要

Reinforcement learning in physical world is often expensive. Simulators are commonly employed to train policies. Due to the simulation error, trained-in-simulator policies are hard to be directly deployed in physical world. Therefore, how to efficiently reuse these policies to the real environment is a key issue. To address this issue, this paper presents a policy self-evolution process: in the target environment, the agent firstly executes a few calibration actions to perceive the environment, and then reuses the previous policies according to the observation of the environment. In this way, the mission of policy learning in the target environment is reduced to the task of environment identification through executing the calibration actions, which needs much less samples than learning a policy from scratch. We propose the POSEC (POlicy Self-Evolution by Calibration) approach, which learns the most informative calibration actions for policy self-evolution. Taking three robotic arm controlling tasks as the test beds, we show that the proposed method can learn a fine policy for a new arm with only a few (e.g. five) samples of the target environment.

关键词

Computer scienceReinforcement learningTask (project management)CalibrationProcess (computing)ReuseIdentification (biology)Key (lock)Human–computer interactionArtificial intelligence

Learning Environmental Calibration Actions for Policy Self-Evolution

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory