Home /Research /Efficient Transformers in Reinforcement Learning using Actor-Learner\n Distillation

LEARNING

Efficient Transformers in Reinforcement Learning using Actor-Learner\n Distillation

Emilio Parisotto, Ruslan Salakhutdinov

Year: 2021
Citations: 9
Access: Open access

Abstract

Many real-world applications such as robotics provide hard constraints on\npower and compute that limit the viable model complexity of Reinforcement\nLearning (RL) agents. Similarly, in many distributed RL settings, acting is\ndone on un-accelerated hardware such as CPUs, which likewise restricts model\nsize to prevent intractable experiment run times. These "actor-latency"\nconstrained settings present a major obstruction to the scaling up of model\ncomplexity that has recently been extremely successful in supervised learning.\nTo be able to utilize large model capacity while still operating within the\nlimits imposed by the system during acting, we develop an "Actor-Learner\nDistillation" (ALD) procedure that leverages a continual form of distillation\nthat transfers learning progress from a large capacity learner model to a small\ncapacity actor model. As a case study, we develop this procedure in the context\nof partially-observable environments, where transformer models have had large\nimprovements over LSTMs recently, at the cost of significantly higher\ncomputational complexity. With transformer models as the learner and LSTMs as\nthe actor, we demonstrate in several challenging memory environments that using\nActor-Learner Distillation recovers the clear sample-efficiency gains of the\ntransformer learner model while maintaining the fast inference and reduced\ntotal training time of the LSTM actor model.\n

Keywords

Reinforcement learningComputer scienceTransformerInferenceArtificial intelligenceMachine learningDistillationRoboticsRobot

Efficient Transformers in Reinforcement Learning using Actor-Learner\n Distillation

Abstract

Keywords

Related papers

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory