Home /Research /Learning Environmental Calibration Actions for Policy Self-Evolution

LEARNING

Learning Environmental Calibration Actions for Policy Self-Evolution

Chao Zhang, Yang Yu, Zhi‐Hua Zhou

Year: 2018
Citations: 13
Access: Open access

Abstract

Reinforcement learning in physical world is often expensive. Simulators are commonly employed to train policies. Due to the simulation error, trained-in-simulator policies are hard to be directly deployed in physical world. Therefore, how to efficiently reuse these policies to the real environment is a key issue. To address this issue, this paper presents a policy self-evolution process: in the target environment, the agent firstly executes a few calibration actions to perceive the environment, and then reuses the previous policies according to the observation of the environment. In this way, the mission of policy learning in the target environment is reduced to the task of environment identification through executing the calibration actions, which needs much less samples than learning a policy from scratch. We propose the POSEC (POlicy Self-Evolution by Calibration) approach, which learns the most informative calibration actions for policy self-evolution. Taking three robotic arm controlling tasks as the test beds, we show that the proposed method can learn a fine policy for a new arm with only a few (e.g. five) samples of the target environment.

Keywords

Computer scienceReinforcement learningTask (project management)CalibrationProcess (computing)ReuseIdentification (biology)Key (lock)Human–computer interactionArtificial intelligence

Learning Environmental Calibration Actions for Policy Self-Evolution

Abstract

Keywords

Related papers

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory