首页 /研究 /Viewpoint Rosetta Stone: Unlocking Unpaired Ego-Exo Videos for View-invariant Representation Learning
OTHER

Viewpoint Rosetta Stone: Unlocking Unpaired Ego-Exo Videos for View-invariant Representation Learning

Mi Luo, Zihui Xue, Alex Dimakis, Kristen Grauman

发表年份
2025
引用次数
1

摘要

Egocentric and exocentric perspectives of human action differ significantly, yet overcoming this extreme viewpoint gap is critical in augmented reality and robotics. We propose VIEWPOINTROSETTA, an approach that unlocks large-scale unpaired ego and exo video data to learn clip-level viewpoint-invariant video representations. Our framework introduces (1) a diffusion-based Rosetta Stone Translator (RST), which, leveraging a moderate amount of synchronized multi-view videos, serves as a translator in feature space to decipher the alignment between unpaired ego and exo data, and (2) a dual encoder that aligns unpaired data representations through contrastive learning with RST-based synthetic feature augmentation and soft alignment. To evaluate the learned features in a standardized setting, we construct a new cross-view benchmark using Ego-Exo4D, covering cross-view retrieval, action recognition, and skill assessment tasks. Our framework demonstrates superior cross-view understanding compared to previous view-invariant learning and ego video representation learning approaches, and opens the door to bringing vast amounts of traditional third-person video to bear on the more nascent first-person setting.

关键词

Artificial intelligenceId, ego and super-egoComputer scienceRepresentation (politics)Invariant (physics)Computer visionCognitive sciencePsychologyMathematicsSocial psychology

相关论文

查看 OTHER 分类全部论文