首页 /研究 /Instruction for reinforcement learning agent based on sub-rewards and forgetting

LEARNING

Instruction for reinforcement learning agent based on sub-rewards and forgetting

Toshihiko Watanabe, Toru Sawa

发表年份: 2010
引用次数: 6

摘要

In order to realize intelligent agent such as autonomous mobile robots, Reinforcement Learning is one of the necessary techniques in control system. It is desirable in terms of knowledge or skill acquisition of agent that reinforcement learning is based only upon rewards concept instead of teaching signal. However, there exist many problems to apply reinforcement learning to actual problem. The most severe problem is huge iterations in learning process. On the other hand, several methods such as intrinsically motivated reinforcement learning have been studied. The methods are based on internal rewards to formulate behavioral rules abstracted from the results of reinforcement learning expressed as action rules. They are promising techniques for task decomposition of complicated task of agent. In the abstraction process, segmentation of learning is an indispensable and essential technique. Our motivation is to utilize appropriately instructions that we can give to the reinforcement learning agent along with main rewards in order to haste the learning process and to attain valid learning performance for preparation of segmentation. In this study, we propose instruction approach for reinforcement learning agent based on sub-reward and forgetting mechanism. Through numerical experiments of grid world task and mountain car task, we show validness of the proposed approach in terms of learning speed and accuracy.

关键词

Reinforcement learningForgettingComputer scienceTask (project management)Artificial intelligenceProcess (computing)AbstractionRobot learningReinforcementQ-learning

Instruction for reinforcement learning agent based on sub-rewards and forgetting

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory