Constrained reinforcement learning from intrinsic and extrinsic rewards

Eiji Uchibe, Kenji Doya

发表年份: 2007
引用次数: 49

摘要

The main objective of a standard reinforcement learner is usually defined as maximization of a scalar reward function given externally from the environment. On the other hand, an intrinsically motivated reinforcement learner creates an intrinsic reward function from its own criteria such as curiosity, prediction error, and learning progress. This paper proposes a novel approach to deal with both intrinsic and extrinsic rewards for reinforcement learning from a viewpoint of constrained optimization problem. The extrinsic rewards construct inequality constraints to the stochastic policy while the intrinsic reward determines the current objective function for the learning agent. By integrating policy gradient reinforcement learning algorithms and techniques used in nonlinear programming, our proposed method, named the constrained policy gradient reinforcement learning (CPGRL), maximizes the long-term average intrinsic reward under the inequality constraints induced by the extrinsic rewards. The CPGRL is successfully applied to a simple MDP problem and a control task of a robot arm.

关键词

Reinforcement learningComputer scienceMaximizationCuriosityReinforcementFunction (biology)Mathematical optimizationTask (project management)Artificial intelligenceMathematics

Constrained reinforcement learning from intrinsic and extrinsic rewards

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Fractional Differential Equations

Applied Nonlinear Control