首页 /研究 /Reinforcement Learning and Approximate Dynamic Programming for Feedback Control
LEARNING

Reinforcement Learning and Approximate Dynamic Programming for Feedback Control

发表年份
2012
引用次数
573

摘要

In this chapter, we extend the ADP algorithm, dual heuristic programming (DHP), to include a “bootstrapping” parameter λ, analogous to that used in the reinforcement learning algorithm TD(λ). The resulting algorithm, which we call VGL(λ) for value-gradient learning, is proven to produce a weight update that can be equivalent to backpropagation through time (BPTT) applied to a greedy policy on a critic function. This provides a surprising connection between the two alternate methods of BPTT and DHP. Under certain smoothness conditions, VGL(λ=1) with a greedy policy acquires the strong convergence conditions of BPTT, while using a general function approximator for the critic. We show that this can lead to increased stability in the learning of control problems by a neural network

关键词

Reinforcement learningComputer scienceFeedback controlReinforcementControl (management)Dynamic programmingArtificial intelligenceControl engineeringPsychologyEngineering

相关论文

查看 LEARNING 分类全部论文