首页 /研究 /Constrained reinforcement learning from intrinsic and extrinsic rewards
LEARNING

Constrained reinforcement learning from intrinsic and extrinsic rewards

Eiji Uchibe, Kenji Doya

发表年份
2007
引用次数
49

摘要

The main objective of a standard reinforcement learner is usually defined as maximization of a scalar reward function given externally from the environment. On the other hand, an intrinsically motivated reinforcement learner creates an intrinsic reward function from its own criteria such as curiosity, prediction error, and learning progress. This paper proposes a novel approach to deal with both intrinsic and extrinsic rewards for reinforcement learning from a viewpoint of constrained optimization problem. The extrinsic rewards construct inequality constraints to the stochastic policy while the intrinsic reward determines the current objective function for the learning agent. By integrating policy gradient reinforcement learning algorithms and techniques used in nonlinear programming, our proposed method, named the constrained policy gradient reinforcement learning (CPGRL), maximizes the long-term average intrinsic reward under the inequality constraints induced by the extrinsic rewards. The CPGRL is successfully applied to a simple MDP problem and a control task of a robot arm.

关键词

Reinforcement learningComputer scienceMaximizationCuriosityReinforcementFunction (biology)Mathematical optimizationTask (project management)Artificial intelligenceMathematics

相关论文

查看 LEARNING 分类全部论文