Home /Research /Instant-Fold: In-Context Imitation Learning for Deformable Object Manipulation

MANIPULATION

Instant-Fold: In-Context Imitation Learning for Deformable Object Manipulation

Yilong Wang, Cheng Qian, Edward Johns

Year: 2026
Access: Open access

Abstract

Deformable object manipulation (DOM) is challenging due to high-dimensional, partially observable states that evolve through long-horizon, topology-changing interactions with multiple valid manipulation modes. We introduce Instant-Fold, an in-context imitation learning framework for DOM. Given a single human demonstration, our policy infers and executes diverse manipulation modes directly from the demonstration, including variations in spatial execution and ordering, without requiring gradient updates. Our approach first learns deformation-aware visual representations via temporal contrastive pretraining, after which a flow-matching transformer policy conditioned on the demonstration predicts actions to execute the intended manipulation mode. Trained entirely in simulation, Instant-Fold generalizes across diverse folding modes and transfers zero-shot to real-world settings without additional data collection or finetuning. Videos are available at https://instant-fold.github.io.

Keywords

deformable object manipulationimitation learningin-context learningzero-shot transferflow-matching transformer

Instant-Fold: In-Context Imitation Learning for Deformable Object Manipulation

Abstract

Keywords

Related papers

Real-Time Obstacle Avoidance for Manipulators and Mobile Robots

A Mathematical Introduction to Robotic Manipulation

Robot dynamics and control

A tutorial on visual servo control