Value Functions for Temporal Logic: Optimal Policies and Safety Filters

Oswin So, William Sharpless, Sylvia Herbert, Chuchu Fan

发表年份: 2026
访问权限: 开放获取

摘要

While Bellman equations for basic reach, avoid, and reach-avoid problems are well studied, the relationship between value optimality and policy optimality becomes subtle in the undiscounted infinite-horizon setting, particularly for more complicated tasks. Greedily maximizing the Q-function can produce policies that indefinitely defer task completion for reach-avoid problems, or equivalently, Until specifications, even when the value function is optimal. Building upon recent results decomposing the value function for temporal logic (TL) into a graph of constituent value functions, we construct non-Markovian policies based on state history that avoid this pathology and prove their optimality with respect to the quantitative robustness score for nested Until, Globally, and Globally-Until specifications. We further show how the Q function can serve as a safety filter for complex TL specifications, extending prior results beyond simple avoid or reach-avoid tasks.

关键词

cs.ROcs.AIcs.LGcs.LOmath.OC

Value Functions for Temporal Logic: Optimal Policies and Safety Filters

摘要

关键词

相关论文

Statistical Learning Theory

Fractional Differential Equations

Applied Nonlinear Control

Genetic Programming: On the Programming of Computers by Means of Natural Selection