Home /Research /Y-MAP-Net: Real-time depth, normals, segmentation, multi-label captioning and 2D human pose in RGB images
LEARNING

Y-MAP-Net: Real-time depth, normals, segmentation, multi-label captioning and 2D human pose in RGB images

Ammar Qammaz, Nikolaos Vasilikopoulos, Iason Oikonomidis, Antonis A. Argyros

Year
2024
Access
Open access

Abstract

We present Y-MAP-Net, a Y-shaped neural network architecture designed for real-time multi-task learning on RGB images. Y-MAP-Net, simultaneously predicts depth, surface normals, human pose, semantic segmentation and generates multi-label captions, all from a single network evaluation. To achieve this, we adopt a multi-teacher, single-student training paradigm, where task-specific foundation models supervise the network's learning, enabling it to distill their capabilities into a lightweight architecture suitable for real-time applications. Y-MAP-Net, exhibits strong generalization, simplicity and computational efficiency, making it ideal for robotics and other practical scenarios. To support future research, we will release our code publicly.

Keywords

cs.CV

Related papers

Browse all LEARNING papers