Implicit Constraint-Aware Off-Policy Correction for Offline Reinforcement Learning
Ali Baheri
- 发表年份
- 2025
- 访问权限
- 开放获取
摘要
Offline reinforcement learning promises policy improvement from logged interaction data alone, yet state-of-the-art algorithms remain vulnerable to value over-estimation and to violations of domain knowledge such as monotonicity or smoothness. We introduce implicit constraint-aware off-policy correction, a framework that embeds structural priors directly inside every Bellman update. The key idea is to compose the optimal Bellman operator with a proximal projection on a convex constraint set, which produces a new operator that (i) remains a $γ$-contraction, (ii) possesses a unique fixed point, and (iii) enforces the prescribed structure exactly. A differentiable optimization layer solves the projection; implicit differentiation supplies gradients for deep function approximators at a cost comparable to implicit Q-learning. On a synthetic Bid-Click auction -- where the true value is provably monotone in the bid -- our method eliminates all monotonicity violations and outperforms conservative Q-learning and implicit Q-learning in return, regret, and sample efficiency.
关键词
相关论文
面向学习与规划的并行可微可达性:具有认证神经动力学与控制器的系统
Keyi Shen, Glen Chou
2026
人工智能增强的智能焊接岛:基础模型革新制造业
Xiwei Wu, Wei Wu, Qiqi Chen 等 9 位作者
Robotics and Computer-Integrated Manufacturing · 2026
基于深度强化学习和动态图神经网络的多任务机器人调度代理
Hedi Boukamcha, Anas Neumann, Monia Rekik 等 6 位作者
Robotics and Computer-Integrated Manufacturing · 2026
基于微调与AAS增强检索的LLM驱动自动化DFA评估
Jiaxin Liu, Xiaofeng Zhou, Suyang Yu 等 8 位作者
Robotics and Computer-Integrated Manufacturing · 2026