首页 /研究 /Learning Visual Localization of a Quadrotor Using Its Noise as Self-Supervision

LEARNING

Learning Visual Localization of a Quadrotor Using Its Noise as Self-Supervision

Mirko Nava, Antonio Paolillo, Jérôme Guzzi, Luca Maria Gambardella, Alessandro Giusti

发表年份: 2022
引用次数: 15
访问权限: 开放获取

摘要

We introduce an approach to train neural network models for visual object localization using a small training set, labeled with ground truth object positions and a large unlabeled one. We assume that the object to be localized emits sound, which is perceived by a microphone rigidly affixed to the camera. This information is used as the target of a cross-modal pretext task: predicting sound features from camera frames. By solving the pretext task, the model draws self-supervision from visual and audio data. The approach is well suited to robot learning: we instantiate it to localize a small quadrotor from 128 × 80 pixel images acquired by a ground robot. Experiments on a separate testing set show that introducing the auxiliary pretext task yields large performance improvements: the Mean Absolute Error (MAE) of the estimated image coordinates of the target is reduced from 7 to 4 pixels; the MAE of the estimated distance is reduced from 28 cm to 14 cm. A model that has access to labels for the entire training set yields an MAE of 2 pixels and 11 cm, respectively.

关键词

Artificial intelligencePretextPixelComputer scienceComputer visionTask (project management)RobotSet (abstract data type)Noise (video)Frame (networking)

Learning Visual Localization of a Quadrotor Using Its Noise as Self-Supervision

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory