首页 /研究 /RoDiF: Robust Direct Fine-Tuning of Diffusion Policies with Corrupted Human Feedback

MANIPULATION

RoDiF: Robust Direct Fine-Tuning of Diffusion Policies with Corrupted Human Feedback

Amitesh Vatsa, Zhixian Xie, Wanxin Jin

发表年份: 2026
访问权限: 开放获取

摘要

Diffusion policies are a powerful paradigm for robotic control, but fine-tuning them with human preferences is fundamentally challenged by the multi-step structure of the denoising process. To overcome this, we introduce a Unified Markov Decision Process (MDP) formulation that coherently integrates the diffusion denoising chain with environmental dynamics, enabling reward-free Direct Preference Optimization (DPO) for diffusion policies. Building on this formulation, we propose RoDiF (Robust Direct Fine-Tuning), a method that explicitly addresses corrupted human preferences. RoDiF reinterprets the DPO objective through a geometric hypothesis-cutting perspective and employs a conservative cutting strategy to achieve robustness without assuming any specific noise distribution. Extensive experiments on long-horizon manipulation tasks show that RoDiF consistently outperforms state-of-the-art baselines, effectively steering pretrained diffusion policies of diverse architectures to human-preferred modes, while maintaining strong performance even under 30% corrupted preference labels.

关键词

cs.ROcs.LG

RoDiF: Robust Direct Fine-Tuning of Diffusion Policies with Corrupted Human Feedback

摘要

关键词

相关论文

Real-Time Obstacle Avoidance for Manipulators and Mobile Robots

A Mathematical Introduction to Robotic Manipulation

Robot dynamics and control

A tutorial on visual servo control