首页 /研究 /Training and delayed reinforcements in Q-learning agents
OTHER

Training and delayed reinforcements in Q-learning agents

Marco Dorigo

发表年份
1997
引用次数
21

摘要

Q-learning can greatly improve its convergence speed if helped by immediate reinforcements provided by a trainer able to judge the usefulness of actions as stage setting with respect to the goal of the agent. This article experimentally investigates this hypothesis studying the integration of immediate reinforcements (also called training reinforcements) with standard delayed reinforcements (namely, reinforcements assigned only when the agent–environment relationship reaches a peculiar state, such as when the agent reaches a target). The article proposes two new algorithms (TL and MTL) able to exploit even locally wrong and misleading training reinforcements. The proposed algorithms are tested against Q-learning and other algorithms (AB–LEC and BB–LEC) described in the literature [S. D. Whitehead, TR-365, University of Rochester, NY, 1991], which also make use of training reinforcements. Experiments are run in a grid world where a Q-agent, a simple simulated robot, must learn to reach a target. © 1997 John Wiley & Sons, Inc.

关键词

Reinforcement learningTrainerReinforcementComputer scienceArtificial intelligenceSimple (philosophy)ExploitConvergence (economics)GridQ-learning

相关论文

查看 OTHER 分类全部论文