Auto-exploratory average reward reinforcement learning

DoKyeong Ok, Prasad Tadepalli

发表年份: 1996
引用次数: 9

摘要

We introduce a model-based average reward Reinforcement Learning method called H-learning and compare it with its discounted counterpart, Adaptive Real-Time Dynamic Programming, in a simulated robot scheduling task. We also introduce an extension to H-learning, which automatically explores the unexplored parts of the state space, while always choosing greedy actions with respect to the current value function. We show that this &quot;Auto-exploratory H-learning&quot; performs better than the original H-learning under previously studied exploration methods such as random, recency-based, or counter-based exploration. Introduction Reinforcement Learning (RL) is the study of learning agents that improve their performance at some task by receiving rewards and punishments from the environment. Most approaches to reinforcement learning, including Q-learning (Watkins and Dayan 92) and Adaptive Real-Time Dynamic Programming (ARTDP) (Barto, Bradtke, &amp; Singh 95), optimize the total discounted reward the ...

关键词

Reinforcement learningComputer scienceBellman equationArtificial intelligenceScheduling (production processes)Task (project management)RobotState spaceMachine learningMathematical optimization

Auto-exploratory average reward reinforcement learning

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory