Reinforcement Learning with Perturbed Rewards
Jingkang Wang, Yang Liu, Bo Li
- Year
- 2018
- Access
- Open access
Abstract
Recent studies have shown that reinforcement learning (RL) models are vulnerable in various noisy scenarios. For instance, the observed reward channel is often subject to noise in practice (e.g., when rewards are collected through sensors), and is therefore not credible. In addition, for applications such as robotics, a deep reinforcement learning (DRL) algorithm can be manipulated to produce arbitrary errors by receiving corrupted rewards. In this paper, we consider noisy RL problems with perturbed rewards, which can be approximated with a confusion matrix. We develop a robust RL framework that enables agents to learn in noisy environments where only perturbed rewards are observed. Our solution framework builds on existing RL/DRL algorithms and firstly addresses the biased noisy reward setting without any assumptions on the true distribution (e.g., zero-mean Gaussian noise as made in previous works). The core ideas of our solution include estimating a reward confusion matrix and defining a set of unbiased surrogate rewards. We prove the convergence and sample complexity of our approach. Extensive experiments on different DRL platforms show that trained policies based on our estimated surrogate reward can achieve higher expected rewards, and converge faster than existing baselines. For instance, the state-of-the-art PPO algorithm is able to obtain 84.6% and 80.8% improvements on average score for five Atari games, with error rates as 10% and 30% respectively.
Keywords
Related papers
Parallel Differentiable Reachability for Learning and Planning with Certified Neural Dynamics and Controllers
Keyi Shen, Glen Chou
2026
Artificial Intelligence enhanced smart welding islands: Foundation models revolutionizing manufacturing
Xiwei Wu, Wei Wu, Qiqi Chen +6 more
Robotics and Computer-Integrated Manufacturing · 2026
A deep reinforcement learning and a dynamic graph neural network-based scheduling agent to control a multi-task robot
Hedi Boukamcha, Anas Neumann, Monia Rekik +3 more
Robotics and Computer-Integrated Manufacturing · 2026
LLM Agent-driven Automated DFA Assessment with Fine-tuning and AAS-based RAG
Jiaxin Liu, Xiaofeng Zhou, Suyang Yu +5 more
Robotics and Computer-Integrated Manufacturing · 2026