Two mode Q-learning

Kui-Hong Park, Jong-Hwan Kim

发表年份: 2004
引用次数: 3

摘要

In this paper, a new two mode Q-learning using both the success and failure experiences of an agent is proposed for the fast convergence, which extends Q-learning, a well-known scheme used for reinforcement learning. In the Q-learning, if the agent enters into the "fail" state, it receives a punishment from environment. By this punishment, the Q value of the action which generated the failure experience is decreased. On the other hand, the proposed two mode Q-learning is based on both the normal and failure Q values for the selection of the action in a state-action space. To determine the failure Q value using the previous failure experience of the agent, it employs a failure Q value module. To demonstrate the effectiveness of the proposed method, it is compared with the conventional Q-learning in a goalie system to perform goalkeeping in robot soccer.

关键词

Q-learningReinforcement learningConvergence (economics)Action selectionPunishment (psychology)Computer scienceAction (physics)Failure mode and effects analysisArtificial intelligenceState space

Two mode Q-learning

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory