Home /Research /Improving Unimodal Object Recognition with Multimodal Contrastive Learning

PERCEPTION

Improving Unimodal Object Recognition with Multimodal Contrastive Learning

Johannes Meyer, Andreas Eitel, Thomas Brox, Wolfram Burgard

Year: 2020
Citations: 21

Abstract

Robots perceive their environment using various sensor modalities, e.g., vision, depth, sound or touch. Each modality provides complementary information for perception. However, while it can be assumed that all modalities are available for training, when deploying the robot in real-world scenarios the sensor setup often varies. In order to gain flexibility with respect to the deployed sensor setup we propose a new multimodal approach within the framework of contrastive learning. In particular, we consider the case of learning from RGB-D images while testing with one modality available, i.e., exclusively RGB or depth. We leverage contrastive learning to capture high-level information between different modalities in a compact feature embedding. We extensively evaluate our multimodal contrastive learning method on the Falling Things dataset and learn representations that outperform prior methods for RGB-D object recognition on the NYU-D dataset. Our code and details on the used datasets are available at: https://github.com/meyerjo/MultiModalContrastiveLearning.

Keywords

Computer scienceArtificial intelligenceModality (human–computer interaction)ModalitiesMultimodal learningLeverage (statistics)EmbeddingRGB color modelRobotFeature learning

Improving Unimodal Object Recognition with Multimodal Contrastive Learning

Abstract

Keywords

Related papers

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory