首页 /研究 /Dynamic Actor-Advisor Programming for Scalable Safe Reinforcement Learning

LEARNING

Dynamic Actor-Advisor Programming for Scalable Safe Reinforcement Learning

Lingwei Zhu, Yunduan Cui, Takamitsu Matsubara

发表年份: 2020
引用次数: 6

摘要

Real-world robots have complex strict constraints. Therefore, safe reinforcement learning algorithms that can simultaneously minimize the total cost and the risk of constraint violation are crucial. However, almost no algorithms exist that can scale to high-dimensional systems to the best of our knowledge. In this paper, we propose Dynamic Actor-Advisor Programming (DAAP), as an algorithm for sample-efficient and scalable safe reinforcement learning. DAAP employs two control policies, actor and advisor. They are updated to minimize total cost and risk of constraint violation intertwiningly and smoothly towards each other's direction by using the other as the baseline policy in the Kullback-Leibler divergence of Dynamic Policy Programming framework. We demonstrate the scalability and sample efficiency of DAAP through its application on simulated robot arm control tasks with performance comparisons to baselines.

关键词

Reinforcement learningScalabilityComputer scienceDynamic programmingConstraint (computer-aided design)RobotBaseline (sea)Sample (material)Divergence (linguistics)Constraint programming

Dynamic Actor-Advisor Programming for Scalable Safe Reinforcement Learning

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory