Training and delayed reinforcements in Q-learning agents

Marco Dorigo

发表年份: 1997
引用次数: 21

摘要

Q-learning can greatly improve its convergence speed if helped by immediate reinforcements provided by a trainer able to judge the usefulness of actions as stage setting with respect to the goal of the agent. This article experimentally investigates this hypothesis studying the integration of immediate reinforcements (also called training reinforcements) with standard delayed reinforcements (namely, reinforcements assigned only when the agent–environment relationship reaches a peculiar state, such as when the agent reaches a target). The article proposes two new algorithms (TL and MTL) able to exploit even locally wrong and misleading training reinforcements. The proposed algorithms are tested against Q-learning and other algorithms (AB–LEC and BB–LEC) described in the literature [S. D. Whitehead, TR-365, University of Rochester, NY, 1991], which also make use of training reinforcements. Experiments are run in a grid world where a Q-agent, a simple simulated robot, must learn to reach a target. © 1997 John Wiley & Sons, Inc.

关键词

Reinforcement learningTrainerReinforcementComputer scienceArtificial intelligenceSimple (philosophy)ExploitConvergence (economics)GridQ-learning

Training and delayed reinforcements in Q-learning agents

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Fractional Differential Equations

Applied Nonlinear Control