Showing Your Offline Reinforcement Learning Work: Online Evaluation\n Budget Matters
Vladislav Kurenkov, С. В. Колесников
- Year
- 2021
- Citations
- 2
- Access
- Open access
Abstract
In this work, we argue for the importance of an online evaluation budget for\na reliable comparison of deep offline RL algorithms. First, we delineate that\nthe online evaluation budget is problem-dependent, where some problems allow\nfor less but others for more. And second, we demonstrate that the preference\nbetween algorithms is budget-dependent across a diverse range of\ndecision-making domains such as Robotics, Finance, and Energy Management.\nFollowing the points above, we suggest reporting the performance of deep\noffline RL algorithms under varying online evaluation budgets. To facilitate\nthis, we propose to use a reporting tool from the NLP field, Expected\nValidation Performance. This technique makes it possible to reliably estimate\nexpected maximum performance under different budgets while not requiring any\nadditional computation beyond hyperparameter search. By employing this tool, we\nalso show that Behavioral Cloning is often more favorable to offline RL\nalgorithms when working within a limited budget.\n
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002