USHER: Unbiased Sampling for Hindsight Experience Replay

Liam Schramm, Yunfu Deng, Edgar Granados, Abdeslam Boularias

发表年份: 2022
引用次数: 2
访问权限: 开放获取

摘要

Dealing with sparse rewards is a long-standing challenge in reinforcement learning (RL). Hindsight Experience Replay (HER) addresses this problem by reusing failed trajectories for one goal as successful trajectories for another. This allows for both a minimum density of reward and for generalization across multiple goals. However, this strategy is known to result in a biased value function, as the update rule underestimates the likelihood of bad outcomes in a stochastic environment. We propose an asymptotically unbiased importance-sampling-based algorithm to address this problem without sacrificing performance on deterministic environments. We show its effectiveness on a range of robotic systems, including challenging high dimensional stochastic environments.

关键词

Hindsight biasComputer scienceGeneralizationReinforcement learningSampling (signal processing)Mathematical optimizationThompson samplingRange (aeronautics)Bellman equationFunction (biology)

USHER: Unbiased Sampling for Hindsight Experience Replay

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory