首页 /研究 /TOM: Learning Policy-Aware Models for Model-Based Reinforcement Learning via Transition Occupancy Matching

LEARNING

TOM: Learning Policy-Aware Models for Model-Based Reinforcement Learning via Transition Occupancy Matching

Yecheng Jason Ma, Kausik Sivakumar, Jason Yan, Osbert Bastani, Dinesh Jayaraman

发表年份: 2023
访问权限: 开放获取

摘要

Standard model-based reinforcement learning (MBRL) approaches fit a transition model of the environment to all past experience, but this wastes model capacity on data that is irrelevant for policy improvement. We instead propose a new "transition occupancy matching" (TOM) objective for MBRL model learning: a model is good to the extent that the current policy experiences the same distribution of transitions inside the model as in the real environment. We derive TOM directly from a novel lower bound on the standard reinforcement learning objective. To optimize TOM, we show how to reduce it to a form of importance weighted maximum-likelihood estimation, where the automatically computed importance weights identify policy-relevant past experiences from a replay buffer, enabling stable optimization. TOM thus offers a plug-and-play model learning sub-routine that is compatible with any backbone MBRL algorithm. On various Mujoco continuous robotic control tasks, we show that TOM successfully focuses model learning on policy-relevant experience and drives policies faster to higher task rewards than alternative model learning approaches.

关键词

cs.LGcs.AI

TOM: Learning Policy-Aware Models for Model-Based Reinforcement Learning via Transition Occupancy Matching

摘要

关键词

相关论文

面向学习与规划的并行可微可达性：具有认证神经动力学与控制器的系统

人工智能增强的智能焊接岛：基础模型革新制造业

基于深度强化学习和动态图神经网络的多任务机器人调度代理

基于微调与AAS增强检索的LLM驱动自动化DFA评估