Home /Research /Cage: Causal Attention Enables Data-Efficient Generalizable Robotic Manipulation
MANIPULATION

Cage: Causal Attention Enables Data-Efficient Generalizable Robotic Manipulation

Shangning Xia, Hongjie Fang, Cewu Lu, Haoshu Fang

Year
2025
Citations
3

Abstract

Generalization in robotic manipulation remains a critical challenge, particularly when scaling to new environments with limited demonstrations. This paper introduces CAGE, a novel robotic manipulation policy designed to overcome these generalization barriers by integrating the pretrained visual representation with causal attention mechanism. CAGE utilizes the powerful feature extraction capabilities of the vision foundation model DINOv2, combined with LoRA fine-tuning for robust environment understanding. The policy further employs a causal perceiver for effective token compression and a diffusion-based action head with attention to enhance task-specific fine-grained conditioning. With as few as 50 demonstrations from a single training environment, CAGE achieves robust generalization across diverse visual changes in objects, backgrounds, and viewpoints. Extensive experiments validate that CAGE significantly outperforms existing state-of-the-art RGB/RGB-D-based approaches in various manipulation tasks, especially under large distribution shifts. In similar environments, CAGE offers an average of 42 % increase in task completion rate. While all baselines fail in unseen environments, CAGE manages to obtain a 43 % completion rate and a 51 % success rate in average, marking a substantial advancement toward the practical deployment of robots in real-world settings. Project website: cage-policy.github.io.

Keywords

Computer scienceArtificial intelligenceHuman–computer interaction

Related papers

Browse all MANIPULATION papers