Showing Your Offline Reinforcement Learning Work: Online Evaluation\n Budget Matters
Vladislav Kurenkov, С. В. Колесников
- 发表年份
- 2021
- 引用次数
- 2
- 访问权限
- 开放获取
摘要
In this work, we argue for the importance of an online evaluation budget for\na reliable comparison of deep offline RL algorithms. First, we delineate that\nthe online evaluation budget is problem-dependent, where some problems allow\nfor less but others for more. And second, we demonstrate that the preference\nbetween algorithms is budget-dependent across a diverse range of\ndecision-making domains such as Robotics, Finance, and Energy Management.\nFollowing the points above, we suggest reporting the performance of deep\noffline RL algorithms under varying online evaluation budgets. To facilitate\nthis, we propose to use a reporting tool from the NLP field, Expected\nValidation Performance. This technique makes it possible to reliably estimate\nexpected maximum performance under different budgets while not requiring any\nadditional computation beyond hyperparameter search. By employing this tool, we\nalso show that Behavioral Cloning is often more favorable to offline RL\nalgorithms when working within a limited budget.\n
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002