Reinforcement learning with temporal logic rewards

Xiao Li, Cristian-Ioan Vasile, Călin Belta

发表年份: 2017
引用次数: 160

摘要

Reinforcement learning (RL) depends critically on the choice of reward functions used to capture the desired behavior and constraints of a robot. Usually, these are handcrafted by a expert designer and represent heuristics for relatively simple tasks. Real world applications typically involve more complex tasks with rich temporal and logical structure. In this paper we take advantage of the expressive power of temporal logic (TL) to specify complex rules the robot should follow, and incorporate domain knowledge into learning. We propose Truncated Linear Temporal Logic (TLTL) as a specification language, We propose Truncated Linear Temporal Logic (TLTL) as a specification language, that is arguably well suited for the robotics applications, We show in simulated trials that learning is faster and policies obtained using the proposed approach outperform the ones learned using heuristic rewards in terms of the robustness degree, i.e., how well the tasks are satisfied. Furthermore, we demonstrate the proposed RL approach in a toast-placing task learned by a Baxter robot.

关键词

Computer scienceReinforcement learningHeuristicsArtificial intelligenceLinear temporal logicRobustness (evolution)RobotTemporal logicHeuristicTemporal logic of actions

Reinforcement learning with temporal logic rewards

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory