首页 /研究 /Learning Visual Localization of a Quadrotor Using Its Noise as Self-Supervision
LEARNING

Learning Visual Localization of a Quadrotor Using Its Noise as Self-Supervision

Mirko Nava, Antonio Paolillo, Jérôme Guzzi, Luca Maria Gambardella, Alessandro Giusti

发表年份
2022
引用次数
15
访问权限
开放获取

摘要

We introduce an approach to train neural network models for visual object localization using a small training set, labeled with ground truth object positions and a large unlabeled one. We assume that the object to be localized emits sound, which is perceived by a microphone rigidly affixed to the camera. This information is used as the target of a cross-modal pretext task: predicting sound features from camera frames. By solving the pretext task, the model draws self-supervision from visual and audio data. The approach is well suited to robot learning: we instantiate it to localize a small quadrotor from 128 × 80 pixel images acquired by a ground robot. Experiments on a separate testing set show that introducing the auxiliary pretext task yields large performance improvements: the Mean Absolute Error (MAE) of the estimated image coordinates of the target is reduced from 7 to 4 pixels; the MAE of the estimated distance is reduced from 28 cm to 14 cm. A model that has access to labels for the entire training set yields an MAE of 2 pixels and 11 cm, respectively.

关键词

Artificial intelligencePretextPixelComputer scienceComputer visionTask (project management)RobotSet (abstract data type)Noise (video)Frame (networking)

相关论文

查看 LEARNING 分类全部论文