首页 /研究 /Auto-exploratory average reward reinforcement learning
LEARNING

Auto-exploratory average reward reinforcement learning

DoKyeong Ok, Prasad Tadepalli

发表年份
1996
引用次数
9

摘要

We introduce a model-based average reward Reinforcement Learning method called H-learning and compare it with its discounted counterpart, Adaptive Real-Time Dynamic Programming, in a simulated robot scheduling task. We also introduce an extension to H-learning, which automatically explores the unexplored parts of the state space, while always choosing greedy actions with respect to the current value function. We show that this "Auto-exploratory H-learning" performs better than the original H-learning under previously studied exploration methods such as random, recency-based, or counter-based exploration. Introduction Reinforcement Learning (RL) is the study of learning agents that improve their performance at some task by receiving rewards and punishments from the environment. Most approaches to reinforcement learning, including Q-learning (Watkins and Dayan 92) and Adaptive Real-Time Dynamic Programming (ARTDP) (Barto, Bradtke, & Singh 95), optimize the total discounted reward the ...

关键词

Reinforcement learningComputer scienceBellman equationArtificial intelligenceScheduling (production processes)Task (project management)RobotState spaceMachine learningMathematical optimization

相关论文

查看 LEARNING 分类全部论文