首页 /研究 /Efficient Transformers in Reinforcement Learning using Actor-Learner\n Distillation

LEARNING

Efficient Transformers in Reinforcement Learning using Actor-Learner\n Distillation

Emilio Parisotto, Ruslan Salakhutdinov

发表年份: 2021
引用次数: 9
访问权限: 开放获取

摘要

Many real-world applications such as robotics provide hard constraints on\npower and compute that limit the viable model complexity of Reinforcement\nLearning (RL) agents. Similarly, in many distributed RL settings, acting is\ndone on un-accelerated hardware such as CPUs, which likewise restricts model\nsize to prevent intractable experiment run times. These "actor-latency"\nconstrained settings present a major obstruction to the scaling up of model\ncomplexity that has recently been extremely successful in supervised learning.\nTo be able to utilize large model capacity while still operating within the\nlimits imposed by the system during acting, we develop an "Actor-Learner\nDistillation" (ALD) procedure that leverages a continual form of distillation\nthat transfers learning progress from a large capacity learner model to a small\ncapacity actor model. As a case study, we develop this procedure in the context\nof partially-observable environments, where transformer models have had large\nimprovements over LSTMs recently, at the cost of significantly higher\ncomputational complexity. With transformer models as the learner and LSTMs as\nthe actor, we demonstrate in several challenging memory environments that using\nActor-Learner Distillation recovers the clear sample-efficiency gains of the\ntransformer learner model while maintaining the fast inference and reduced\ntotal training time of the LSTM actor model.\n

关键词

Reinforcement learningComputer scienceTransformerInferenceArtificial intelligenceMachine learningDistillationRoboticsRobot

Efficient Transformers in Reinforcement Learning using Actor-Learner\n Distillation

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory