首页 /研究 /Improving Unimodal Object Recognition with Multimodal Contrastive Learning
PERCEPTION

Improving Unimodal Object Recognition with Multimodal Contrastive Learning

Johannes Meyer, Andreas Eitel, Thomas Brox, Wolfram Burgard

发表年份
2020
引用次数
21

摘要

Robots perceive their environment using various sensor modalities, e.g., vision, depth, sound or touch. Each modality provides complementary information for perception. However, while it can be assumed that all modalities are available for training, when deploying the robot in real-world scenarios the sensor setup often varies. In order to gain flexibility with respect to the deployed sensor setup we propose a new multimodal approach within the framework of contrastive learning. In particular, we consider the case of learning from RGB-D images while testing with one modality available, i.e., exclusively RGB or depth. We leverage contrastive learning to capture high-level information between different modalities in a compact feature embedding. We extensively evaluate our multimodal contrastive learning method on the Falling Things dataset and learn representations that outperform prior methods for RGB-D object recognition on the NYU-D dataset. Our code and details on the used datasets are available at: https://github.com/meyerjo/MultiModalContrastiveLearning.

关键词

Computer scienceArtificial intelligenceModality (human–computer interaction)ModalitiesMultimodal learningLeverage (statistics)EmbeddingRGB color modelRobotFeature learning

相关论文

查看 PERCEPTION 分类全部论文