首页 /研究 /Correct-by-synthesis reinforcement learning with temporal logic constraints

LEARNING

Correct-by-synthesis reinforcement learning with temporal logic constraints

Min Wen, Ufuk Topcu

发表年份: 2015
引用次数: 11
访问权限: 开放获取

摘要

We consider a problem on the synthesis of reactive controllers that optimize some a priori unknown performance criterion while interacting with an uncontrolled environment such that the system satisfies a given temporal logic specification. We decouple the problem into two subproblems. First, we extract a (maximally) permissive strategy for the system, which encodes multiple (possibly all) ways in which the system can react to the adversarial environment and satisfy the specifications. Then, we quantify the a priori unknown performance criterion as a (still unknown) reward function and compute an optimal strategy for the system within the operating envelope allowed by the permissive strategy by using the so-called maximin-Q learning algorithm. We establish both correctness (with respect to the temporal logic specifications) and optimality (with respect to the a priori unknown performance criterion) of this two-step technique for a fragment of temporal logic specifications. For specifications beyond this fragment, correctness can still be preserved, but the learned strategy may be sub-optimal. We present an algorithm to the overall problem, and demonstrate its use and computational requirements on a set of robot motion planning examples.

关键词

CorrectnessA priori and a posterioriTemporal logicComputer scienceMinimaxReinforcement learningSet (abstract data type)Fragment (logic)Envelope (radar)Linear temporal logic

Correct-by-synthesis reinforcement learning with temporal logic constraints

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory