Home /Research /Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers

PERCEPTION

Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers

Miguel Saavedra-Ruiz, Sacha Morin, Liam Paull

Year: 2022
Citations: 6

Abstract

In this work, we consider the problem of learning a perception model for monocular robot navigation using few annotated images. Using a Vision Transformer (ViT) pretrained with a label-free self-supervised method, we successfully train a coarse image segmentation model for the Duckietown environment using 70 training images. Our model performs coarse image segmentation at the <tex>$8\times 8$</tex> patch level, and the inference resolution can be adjusted to balance prediction granularity and real-time perception constraints. We study how best to adapt a ViT to our task and environment, and find that some lightweight architectures can yield good single-image segmentations at a usable frame rate, even on CPU. The resulting perception model is used as the backbone for a simple yet robust visual servoing agent, which we deploy on a differential drive mobile robot to perform two tasks: lane following and obstacle avoidance.

Keywords

Artificial intelligenceComputer scienceComputer visionRobotMonocularMobile robot navigationSegmentationMobile robotImage segmentationFrame rate

Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers

Abstract

Keywords

Related papers

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory