首页 /研究 /ORCHID: Optimisation of Robotic Control and Hardware In Design using Reinforcement Learning
LEARNING

ORCHID: Optimisation of Robotic Control and Hardware In Design using Reinforcement Learning

Lucy Jackson, Celyn Walters, Steve Eckersley, Pete Senior, Simon Hadfield

发表年份
2021
引用次数
9

摘要

The successful performance of any system is dependant on the hardware of the agent, which is typically immutable during RL training. In this work, we present ORCHID (Optimisation of Robotic Control and Hardware In Design) which allows for truly simultaneous optimisation of hardware and control parameters in an RL pipeline. We show that by forming a complex differential path through a trajectory rollout we can leverage a vast amount of information from the system that was previously lost in the ‘black-box’ environment. Combining this with a novel hardware-conditioned critic network minimises variance during training and ensures stable updates are made. This allows for refinements to be made to both the morphology and control parameters simultaneously. The result is an efficient and versatile approach to holistic robot design, that brings the final system nearer to true optimality. We show improvements in performance across 4 different test environments with two different control algorithms - in all experiments the maximum performance achieved with ORCHID is shown to be unattainable using only policy updates with the default design. We also show how re-designing a robot using ORCHID in simulation, transfers to a vast improvement in the performance of a real-world robot.

关键词

Reinforcement learningLeverage (statistics)Computer sciencePipeline (software)RobotTrajectoryRobotic armControl (management)Computer hardwareEmbedded system

相关论文

查看 LEARNING 分类全部论文