Home /Research /Reinforcement Learning and Approximate Dynamic Programming for Feedback Control

LEARNING

Reinforcement Learning and Approximate Dynamic Programming for Feedback Control

Year: 2012
Citations: 573

Abstract

In this chapter, we extend the ADP algorithm, dual heuristic programming (DHP), to include a “bootstrapping” parameter λ, analogous to that used in the reinforcement learning algorithm TD(λ). The resulting algorithm, which we call VGL(λ) for value-gradient learning, is proven to produce a weight update that can be equivalent to backpropagation through time (BPTT) applied to a greedy policy on a critic function. This provides a surprising connection between the two alternate methods of BPTT and DHP. Under certain smoothness conditions, VGL(λ=1) with a greedy policy acquires the strong convergence conditions of BPTT, while using a general function approximator for the critic. We show that this can lead to increased stability in the learning of control problems by a neural network

Keywords

Reinforcement learningComputer scienceFeedback controlReinforcementControl (management)Dynamic programmingArtificial intelligenceControl engineeringPsychologyEngineering

Reinforcement Learning and Approximate Dynamic Programming for Feedback Control

Abstract

Keywords

Related papers

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory