Home /Research /Reinforcement Learning and Approximate Dynamic Programming for Feedback Control
LEARNING

Reinforcement Learning and Approximate Dynamic Programming for Feedback Control

Year
2012
Citations
573

Abstract

In this chapter, we extend the ADP algorithm, dual heuristic programming (DHP), to include a “bootstrapping” parameter λ, analogous to that used in the reinforcement learning algorithm TD(λ). The resulting algorithm, which we call VGL(λ) for value-gradient learning, is proven to produce a weight update that can be equivalent to backpropagation through time (BPTT) applied to a greedy policy on a critic function. This provides a surprising connection between the two alternate methods of BPTT and DHP. Under certain smoothness conditions, VGL(λ=1) with a greedy policy acquires the strong convergence conditions of BPTT, while using a general function approximator for the critic. We show that this can lead to increased stability in the learning of control problems by a neural network

Keywords

Reinforcement learningComputer scienceFeedback controlReinforcementControl (management)Dynamic programmingArtificial intelligenceControl engineeringPsychologyEngineering

Related papers

Browse all LEARNING papers