Home /Research /On Meta-Reinforcement Learning in task distributions with varying dynamics

LEARNING

On Meta-Reinforcement Learning in task distributions with varying dynamics

Federico Retyk

Year: 2021
Citations: 2
Access: Open access

Abstract

Meta-reinforcement learning has the potential to enable artificial agents to master new skills with improved sample-efficiency by leveraging previous learning experience in tasks that are diverse but share common structure. Our focus is to study the application of such algorithms to task distributions where the function that controls the dynamics of the environment is the main factor of variation. We start by providing an introductory background for related fields, including deep reinforcement learning, variational inference, and meta-learning. Then, we conduct a non-systematic review of the state-of-the-art algorithms for meta-reinforcement learning and perform an empirical investigation of PEARL, a method that combines soft actor-critic with latent task variables. Based on our review, we propose and implement two algorithmic modifications for PEARL: one that aims to improve the meta-training sample complexity by automatically adjusting a critical hyperparameter, and a second one focused on improving the meta-testing asymptotic performance by fine-tuning the policy during adaptation. Using a new multi-task environment suite for simulated robotics continuous control tasks, we compare the original version of PEARL and our proposed modifications, obtaining favourable results. Finally, we ponder our findings and suggest future research directions.

Keywords

Reinforcement learningDynamics (music)Task (project management)ReinforcementComputer scienceArtificial intelligenceCognitive psychologyPsychologyEngineeringSocial psychology

On Meta-Reinforcement Learning in task distributions with varying dynamics

Abstract

Keywords

Related papers

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory