A Multiplicative Value Function for Safe and Efficient Reinforcement Learning
Nick Bührer, Zhejun Zhang, Alexander Liniger, Fisher Yu, Luc Van Gool
- 发表年份
- 2023
- 访问权限
- 开放获取
摘要
An emerging field of sequential decision problems is safe Reinforcement Learning (RL), where the objective is to maximize the reward while obeying safety constraints. Being able to handle constraints is essential for deploying RL agents in real-world environments, where constraint violations can harm the agent and the environment. To this end, we propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic. The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns. By splitting responsibilities, we facilitate the learning task leading to increased sample efficiency. We integrate our approach into two popular RL algorithms, Proximal Policy Optimization and Soft Actor-Critic, and evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations. Finally, we make the zero-shot sim-to-real transfer where a differential drive robot has to navigate through a cluttered room. Our code can be found at https://github.com/nikeke19/Safe-Mult-RL.
关键词
相关论文
面向学习与规划的并行可微可达性:具有认证神经动力学与控制器的系统
Keyi Shen, Glen Chou
2026
人工智能增强的智能焊接岛:基础模型革新制造业
Xiwei Wu, Wei Wu, Qiqi Chen 等 9 位作者
Robotics and Computer-Integrated Manufacturing · 2026
基于深度强化学习和动态图神经网络的多任务机器人调度代理
Hedi Boukamcha, Anas Neumann, Monia Rekik 等 6 位作者
Robotics and Computer-Integrated Manufacturing · 2026
基于微调与AAS增强检索的LLM驱动自动化DFA评估
Jiaxin Liu, Xiaofeng Zhou, Suyang Yu 等 8 位作者
Robotics and Computer-Integrated Manufacturing · 2026