Composite Q-learning: Multi-scale Q-function Decomposition and Separable Optimization
Gabriel Kalweit, Maria Huegle, Joschka Boedecker
- 发表年份
- 2019
- 访问权限
- 开放获取
摘要
In the past few years, off-policy reinforcement learning methods have shown promising results in their application for robot control. Deep Q-learning, however, still suffers from poor data-efficiency and is susceptible to stochasticity in the environment or reward functions which is limiting with regard to real-world applications. We alleviate these problems by proposing two novel off-policy Temporal-Difference formulations: (1) Truncated Q-functions which represent the return for the first n steps of a target-policy rollout w.r.t. the full action-value and (2) Shifted Q-functions, acting as the farsighted return after this truncated rollout. This decomposition allows us to optimize both parts with their individual learning rates, achieving significant learning speedup. We prove that the combination of these short- and long-term predictions is a representation of the full return, leading to the Composite Q-learning algorithm. We show the efficacy of Composite Q-learning in the tabular case and compare Deep Composite Q-learning with TD3 and TD3(Delta), which we introduce as an off-policy variant of TD(Delta). Moreover, we show that Composite TD3 outperforms TD3 as well as state-of-the-art compositional Q-learning approaches significantly in terms of data-efficiency in multiple simulated robot tasks and that Composite Q-learning is robust to stochastic environments and reward functions.
关键词
相关论文
面向学习与规划的并行可微可达性:具有认证神经动力学与控制器的系统
Keyi Shen, Glen Chou
2026
人工智能增强的智能焊接岛:基础模型革新制造业
Xiwei Wu, Wei Wu, Qiqi Chen 等 9 位作者
Robotics and Computer-Integrated Manufacturing · 2026
基于深度强化学习和动态图神经网络的多任务机器人调度代理
Hedi Boukamcha, Anas Neumann, Monia Rekik 等 6 位作者
Robotics and Computer-Integrated Manufacturing · 2026
基于微调与AAS增强检索的LLM驱动自动化DFA评估
Jiaxin Liu, Xiaofeng Zhou, Suyang Yu 等 8 位作者
Robotics and Computer-Integrated Manufacturing · 2026