Home /Research /HARL-A: Hardware Agnostic Reinforcement Learning Through Adversarial Selection

LEARNING

HARL-A: Hardware Agnostic Reinforcement Learning Through Adversarial Selection

Lucy Jackson, Steve Eckersley, Pete Senior, Simon Hadfield

Year: 2021
Citations: 2

Abstract

The use of reinforcement learning (RL) has led to huge advancements in the field of robotics. However data scarcity, brittle convergence and the gap between simulation & real world environments, mean that most common RL approaches are subject to over fitting and fail to generalise to unseen environments. Hardware agnostic policies would mitigate this by allowing a single network to operate in a variety of test domains, where dynamics vary due to changes in robotic morphologies or internal parameters. We utilise the idea that learning to adapt a known and successful control policy is easier and more flexible than jointly learning numerous control policies for different morphologies.This paper presents the idea of Hardware Agnostic Reinforcement Learning using Adversarial selection (HARL-A). In this approach training examples are sampled using a novel adversarial loss function. This is designed to self regulate morphologies based on their learning potential. Simply applying our learning potential based loss function to current state-of-the-art already provides ~ 30% improvement in performance. Meanwhile experiments using the full implementation of HARL-A report an average increase of 70% to a standard RL baseline and 55% compared with current state-of-the-art.

Keywords

Reinforcement learningComputer scienceAdversarial systemArtificial intelligenceMachine learningRoboticsFunction (biology)Selection (genetic algorithm)Variety (cybernetics)State (computer science)

HARL-A: Hardware Agnostic Reinforcement Learning Through Adversarial Selection

Abstract

Keywords

Related papers

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory