首页 /研究 /Multi-Step First: A Lightweight Deep Reinforcement Learning Strategy for Robust Continuous Control with Partial Observability

LEARNING

Multi-Step First: A Lightweight Deep Reinforcement Learning Strategy for Robust Continuous Control with Partial Observability

Lingheng Meng, Rob Gorbet, Michael Burke, Dana Kulić

发表年份: 2022
访问权限: 开放获取

摘要

Deep Reinforcement Learning (DRL) has made considerable advances in simulated and physical robot control tasks, especially when problems admit a fully observed Markov Decision Process (MDP) formulation. When observations only partially capture the underlying state, the problem becomes a Partially Observable MDP (POMDP), and performance rankings between algorithms can change. We empirically compare Proximal Policy Optimization (PPO), Twin Delayed Deep Deterministic Policy Gradient (TD3), and Soft Actor-Critic (SAC) on representative POMDP variants of continuous-control benchmarks. Contrary to widely reported MDP results where TD3 and SAC typically outperform PPO, we observe an inversion: PPO attains higher robustness under partial observability. We attribute this to the stabilizing effect of multi-step bootstrapping. Furthermore, incorporating multi-step targets into TD3 (MTD3) and SAC (MSAC) improves their robustness. These findings provide practical guidance for selecting and adapting DRL algorithms in partially observable settings without requiring new theoretical machinery.

关键词

cs.ROcs.AI

Multi-Step First: A Lightweight Deep Reinforcement Learning Strategy for Robust Continuous Control with Partial Observability

摘要

关键词

相关论文

The Organization of Behavior

Fractional Brownian Motions, Fractional Noises and Applications

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A guide to deep learning in healthcare