Planning how to learn

Haoyu Bai, David Hsu, Wee Sun Lee

Year: 2013
Citations: 22

Abstract

When a robot uses an imperfect system model to plan its actions, a key challenge is the exploration-exploitation trade-off between two sometimes conflicting objectives: (i) learning and improving the model, and (ii) immediate progress towards the goal, according to the current model. To address model uncertainty systematically, we propose to use Bayesian reinforcement learning and cast it as a partially observable Markov decision process (POMDP). We present a simple algorithm for offline POMDP planning in the continuous state space. Offline planning produces a POMDP policy, which can be executed efficiently online as a finite-state controller. This approach seamlessly integrates planning and learning: it incorporates learning objectives in the computed plan, which then enables the robot to learn nearly optimally online and reach the goal. We evaluated the approach in simulations on two distinct tasks, acrobot swing-up and autonomous vehicle navigation amidst pedestrians, and obtained interesting preliminary results.

Keywords

Partially observable Markov decision processReinforcement learningComputer scienceRobotPlan (archaeology)Markov decision processArtificial intelligenceKey (lock)Process (computing)Machine learning

Planning how to learn

Abstract

Keywords

Related papers

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory