首页 /研究 /DINOBot: Robot Manipulation via Retrieval and Alignment with Vision Foundation Models

MANIPULATION

DINOBot: Robot Manipulation via Retrieval and Alignment with Vision Foundation Models

Norman Di Palo, Edward Johns

发表年份: 2024
访问权限: 开放获取

摘要

We propose DINOBot, a novel imitation learning framework for robot manipulation, which leverages the image-level and pixel-level capabilities of features extracted from Vision Transformers trained with DINO. When interacting with a novel object, DINOBot first uses these features to retrieve the most visually similar object experienced during human demonstrations, and then uses this object to align its end-effector with the novel object to enable effective interaction. Through a series of real-world experiments on everyday tasks, we show that exploiting both the image-level and pixel-level properties of vision foundation models enables unprecedented learning efficiency and generalisation. Videos and code are available at https://www.robot-learning.uk/dinobot.

关键词

cs.ROcs.LG

DINOBot: Robot Manipulation via Retrieval and Alignment with Vision Foundation Models

摘要

关键词

相关论文

Real-Time Obstacle Avoidance for Manipulators and Mobile Robots

A Mathematical Introduction to Robotic Manipulation

Robot dynamics and control

A tutorial on visual servo control