Home /Research /Imagination at Inference: Synthesizing In-Hand Views for Robust Visuomotor Policy Inference

MANIPULATION

Imagination at Inference: Synthesizing In-Hand Views for Robust Visuomotor Policy Inference

Haoran Ding, Anqing Duan, Zezhou Sun, Dezhen Song, Yoshihiko Nakamura

Year: 2025
Access: Open access

Abstract

Visual observations from different viewpoints can significantly influence the performance of visuomotor policies in robotic manipulation. Among these, egocentric (in-hand) views often provide crucial information for precise control. However, in some applications, equipping robots with dedicated in-hand cameras may pose challenges due to hardware constraints, system complexity, and cost. In this work, we propose to endow robots with imaginative perception - enabling them to 'imagine' in-hand observations from agent views at inference time. We achieve this via novel view synthesis (NVS), leveraging a fine-tuned diffusion model conditioned on the relative pose between the agent and in-hand views cameras. Specifically, we apply LoRA-based fine-tuning to adapt a pretrained NVS model (ZeroNVS) to the robotic manipulation domain. We evaluate our approach on both simulation benchmarks (RoboMimic and MimicGen) and real-world experiments using a Unitree Z1 robotic arm for a strawberry picking task. Results show that synthesized in-hand views significantly enhance policy inference, effectively recovering the performance drop caused by the absence of real in-hand cameras. Our method offers a scalable and hardware-light solution for deploying robust visuomotor policies, highlighting the potential of imaginative visual reasoning in embodied agents.

Keywords

cs.RO

Imagination at Inference: Synthesizing In-Hand Views for Robust Visuomotor Policy Inference

Abstract

Keywords

Related papers

Real-Time Obstacle Avoidance for Manipulators and Mobile Robots

A Mathematical Introduction to Robotic Manipulation

Robot dynamics and control

A tutorial on visual servo control