首页 /研究 /Selecting the Partial State Abstractions of MDPs: A Metareasoning Approach with Deep Reinforcement Learning

LEARNING

Selecting the Partial State Abstractions of MDPs: A Metareasoning Approach with Deep Reinforcement Learning

Samer B. Nashed, Justin Svegliato, Abhinav Bhatia, Stuart Russell, Shlomo Zilberstein

发表年份: 2022
引用次数: 3

摘要

Markov decision processes (MDPs) are a common general-purpose model used in robotics for representing sequential decision-making problems. Given the complexity of robotics applications, a popular approach for approximately solving MDPs relies on state aggregation to reduce the size of the state space but at the expense of policy fidelity-offering a trade-off between policy quality and computation time. Naturally, this poses a challenging metareasoning problem: how can an autonomous system dynamically select different state abstractions that optimize this trade-off as it operates online? In this paper, we formalize this metareasoning problem with a notion of time-dependent utility and solve it using deep reinforcement learning. To do this, we develop several general, cheap heuristics that summarize the reward structure and transition topology of the MDP at hand to serve as effective features. Empirically, we demonstrate that our metareasoning approach outperforms several baseline approaches and a strong heuristic approach on a standard benchmark domain.

关键词

Reinforcement learningComputer scienceMarkov decision processArtificial intelligenceHeuristicsBenchmark (surveying)RoboticsState spaceHeuristicMachine learning

Selecting the Partial State Abstractions of MDPs: A Metareasoning Approach with Deep Reinforcement Learning

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory