首页 /研究 /Imagination at Inference: Synthesizing In-Hand Views for Robust Visuomotor Policy Inference
MANIPULATION

Imagination at Inference: Synthesizing In-Hand Views for Robust Visuomotor Policy Inference

Haoran Ding, Anqing Duan, Zezhou Sun, Dezhen Song, Yoshihiko Nakamura

发表年份
2025
访问权限
开放获取

摘要

Visual observations from different viewpoints can significantly influence the performance of visuomotor policies in robotic manipulation. Among these, egocentric (in-hand) views often provide crucial information for precise control. However, in some applications, equipping robots with dedicated in-hand cameras may pose challenges due to hardware constraints, system complexity, and cost. In this work, we propose to endow robots with imaginative perception - enabling them to 'imagine' in-hand observations from agent views at inference time. We achieve this via novel view synthesis (NVS), leveraging a fine-tuned diffusion model conditioned on the relative pose between the agent and in-hand views cameras. Specifically, we apply LoRA-based fine-tuning to adapt a pretrained NVS model (ZeroNVS) to the robotic manipulation domain. We evaluate our approach on both simulation benchmarks (RoboMimic and MimicGen) and real-world experiments using a Unitree Z1 robotic arm for a strawberry picking task. Results show that synthesized in-hand views significantly enhance policy inference, effectively recovering the performance drop caused by the absence of real in-hand cameras. Our method offers a scalable and hardware-light solution for deploying robust visuomotor policies, highlighting the potential of imaginative visual reasoning in embodied agents.

关键词

cs.RO

相关论文

查看 MANIPULATION 分类全部论文