Home /Research /$π_0$-EqM: Equilibrium Matching for Closed-Loop Vision-Language-Action Control
MANIPULATION

$π_0$-EqM: Equilibrium Matching for Closed-Loop Vision-Language-Action Control

Huanming Liu, Congsheng Xu, Jianmin Ji, Yao Mu

Year
2026
Access
Open access

Abstract

Currently, Vision-Language-Action (VLA) models have become the most adopted paradigm for robotic manipulation for its great potential for task generalization. While most generative flow-matching action decoders for VLA control are often deployed with fixed sampling horizons, limiting state-dependent compute and temporal reuse across control cycles. We present $π_0$-EqM, which replaces the flow-matching expert in $π_0$ with an Equilibrium Matching (EqM) decoder while leaving the upstream VLA stack unchanged. Under a matched 300-step budget, $π_0$-EqM improves RoboTwin average success from 40.4% to 50.2% across 19 tasks and remains competitive on LIBERO, with its clearest gain on LIBERO-10 (87.0%). Two threshold scans reveal a task-dependent non-monotonic relation between residual and success, which we term the stationarity--executability gap. The results suggest that inference depth in iterative VLA control is part of policy design and introduce an energy-based VLA perspective that may inform future work on composable action generation across tasks and embodiments.

Keywords

cs.RO

Related papers

Browse all MANIPULATION papers