首页 /研究 /HARL-A: Hardware Agnostic Reinforcement Learning Through Adversarial Selection
LEARNING

HARL-A: Hardware Agnostic Reinforcement Learning Through Adversarial Selection

Lucy Jackson, Steve Eckersley, Pete Senior, Simon Hadfield

发表年份
2021
引用次数
2

摘要

The use of reinforcement learning (RL) has led to huge advancements in the field of robotics. However data scarcity, brittle convergence and the gap between simulation & real world environments, mean that most common RL approaches are subject to over fitting and fail to generalise to unseen environments. Hardware agnostic policies would mitigate this by allowing a single network to operate in a variety of test domains, where dynamics vary due to changes in robotic morphologies or internal parameters. We utilise the idea that learning to adapt a known and successful control policy is easier and more flexible than jointly learning numerous control policies for different morphologies.This paper presents the idea of Hardware Agnostic Reinforcement Learning using Adversarial selection (HARL-A). In this approach training examples are sampled using a novel adversarial loss function. This is designed to self regulate morphologies based on their learning potential. Simply applying our learning potential based loss function to current state-of-the-art already provides ~ 30% improvement in performance. Meanwhile experiments using the full implementation of HARL-A report an average increase of 70% to a standard RL baseline and 55% compared with current state-of-the-art.

关键词

Reinforcement learningComputer scienceAdversarial systemArtificial intelligenceMachine learningRoboticsFunction (biology)Selection (genetic algorithm)Variety (cybernetics)State (computer science)

相关论文

查看 LEARNING 分类全部论文