首页 /研究 /Showing Your Offline Reinforcement Learning Work: Online Evaluation\n Budget Matters
LEARNING

Showing Your Offline Reinforcement Learning Work: Online Evaluation\n Budget Matters

Vladislav Kurenkov, С. В. Колесников

发表年份
2021
引用次数
2
访问权限
开放获取

摘要

In this work, we argue for the importance of an online evaluation budget for\na reliable comparison of deep offline RL algorithms. First, we delineate that\nthe online evaluation budget is problem-dependent, where some problems allow\nfor less but others for more. And second, we demonstrate that the preference\nbetween algorithms is budget-dependent across a diverse range of\ndecision-making domains such as Robotics, Finance, and Energy Management.\nFollowing the points above, we suggest reporting the performance of deep\noffline RL algorithms under varying online evaluation budgets. To facilitate\nthis, we propose to use a reporting tool from the NLP field, Expected\nValidation Performance. This technique makes it possible to reliably estimate\nexpected maximum performance under different budgets while not requiring any\nadditional computation beyond hyperparameter search. By employing this tool, we\nalso show that Behavioral Cloning is often more favorable to offline RL\nalgorithms when working within a limited budget.\n

关键词

HyperparameterReinforcement learningComputer scienceArtificial intelligenceMachine learningOnline and offlineRange (aeronautics)Field (mathematics)PreferenceComputation

相关论文

查看 LEARNING 分类全部论文