Home /Research /Active policy iteration: efficient exploration through active learning for value function approximation in reinforcement learning

LEARNING

Active policy iteration: efficient exploration through active learning for value function approximation in reinforcement learning

Takayuki Akiyama, Hirotaka Hachiya, Masashi Sugiyama

Year: 2009
Citations: 2

Abstract

Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularly useful when the sampling cost of immediate rewards is high. We demonstrate the usefulness of the proposed method, named active policy iteration (API), through simulations with a batting robot. 1

Keywords

Reinforcement learningComputer scienceBellman equationSampling (signal processing)Artificial intelligenceMachine learningFunction approximationActive learning (machine learning)Function (biology)Mathematical optimization

Active policy iteration: efficient exploration through active learning for value function approximation in reinforcement learning

Abstract

Keywords

Related papers

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory