Home /Research /Constrained reinforcement learning from intrinsic and extrinsic rewards
LEARNING

Constrained reinforcement learning from intrinsic and extrinsic rewards

Eiji Uchibe, Kenji Doya

Year
2007
Citations
49

Abstract

The main objective of a standard reinforcement learner is usually defined as maximization of a scalar reward function given externally from the environment. On the other hand, an intrinsically motivated reinforcement learner creates an intrinsic reward function from its own criteria such as curiosity, prediction error, and learning progress. This paper proposes a novel approach to deal with both intrinsic and extrinsic rewards for reinforcement learning from a viewpoint of constrained optimization problem. The extrinsic rewards construct inequality constraints to the stochastic policy while the intrinsic reward determines the current objective function for the learning agent. By integrating policy gradient reinforcement learning algorithms and techniques used in nonlinear programming, our proposed method, named the constrained policy gradient reinforcement learning (CPGRL), maximizes the long-term average intrinsic reward under the inequality constraints induced by the extrinsic rewards. The CPGRL is successfully applied to a simple MDP problem and a control task of a robot arm.

Keywords

Reinforcement learningComputer scienceMaximizationCuriosityReinforcementFunction (biology)Mathematical optimizationTask (project management)Artificial intelligenceMathematics

Related papers

Browse all LEARNING papers