Home /Research /CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors
OTHER

CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors

Dachong Li, ZhuangZhuang Chen, Jin Zhang, Jianqiang Li

Year
2026
Access
Open access

Abstract

Vision--Language--Action (VLA) models often use intermediate representations to connect multimodal inputs with continuous control, yet spatial guidance is often injected implicitly through latent features. We propose $CorridorVLA$, which predicts sparse spatial anchors as incremental physical changes (e.g., $Δ$-positions) and uses them to impose an explicit tolerance region in the training objective for action generation. The anchors define a corridor that guides a flow-matching action head: trajectories whose implied spatial evolution falls outside it receive corrective gradients, while minor deviations from contacts and execution noise are permitted. On the more challenging LIBERO-Plus benchmark, CorridorVLA yields consistent gains across both SmolVLA and GR00T, improving success rate by $3.4\%$--$12.4\%$ over the corresponding baselines; notably, our GR00T-Corr variant reaches a success rate of $83.21\%$. These results indicate that action-aligned physical cues can provide direct and interpretable constraints for generative action policies, complementing spatial guidance encoded in visual or latent forms. Code is available at https://github.com/corridorVLA.

Keywords

cs.ROcs.AI

Related papers

Browse all OTHER papers