LEARNING
Active policy iteration: efficient exploration through active learning for value function approximation in reinforcement learning
Takayuki Akiyama, Hirotaka Hachiya, Masashi Sugiyama
- 发表年份
- 2009
- 引用次数
- 2
摘要
Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularly useful when the sampling cost of immediate rewards is high. We demonstrate the usefulness of the proposed method, named active policy iteration (API), through simulations with a batting robot. 1
关键词
Reinforcement learningComputer scienceBellman equationSampling (signal processing)Artificial intelligenceMachine learningFunction approximationActive learning (machine learning)Function (biology)Mathematical optimization
相关论文
OTHER
📊 26,957 引用
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
PERCEPTION
📊 22,245 引用
Artificial intelligence: a modern approach
1995
OTHER
📊 18,993 引用
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
SWARM
📊 14,853 引用
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002