首页 /研究 /Reinforcement Learning and Approximate Dynamic Programming for Feedback Control

LEARNING

Reinforcement Learning and Approximate Dynamic Programming for Feedback Control

发表年份: 2012
引用次数: 573

摘要

In this chapter, we extend the ADP algorithm, dual heuristic programming (DHP), to include a “bootstrapping” parameter λ, analogous to that used in the reinforcement learning algorithm TD(λ). The resulting algorithm, which we call VGL(λ) for value-gradient learning, is proven to produce a weight update that can be equivalent to backpropagation through time (BPTT) applied to a greedy policy on a critic function. This provides a surprising connection between the two alternate methods of BPTT and DHP. Under certain smoothness conditions, VGL(λ=1) with a greedy policy acquires the strong convergence conditions of BPTT, while using a general function approximator for the critic. We show that this can lead to increased stability in the learning of control problems by a neural network

关键词

Reinforcement learningComputer scienceFeedback controlReinforcementControl (management)Dynamic programmingArtificial intelligenceControl engineeringPsychologyEngineering

Reinforcement Learning and Approximate Dynamic Programming for Feedback Control

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory