首页 /研究 /FDPP: Fine-tune Diffusion Policy with Human Preference
MANIPULATION

FDPP: Fine-tune Diffusion Policy with Human Preference

Yuxin Chen, Devesh K. Jha, Masayoshi Tomizuka, Diego Romeres

发表年份
2025
访问权限
开放获取

摘要

Imitation learning from human demonstrations enables robots to perform complex manipulation tasks and has recently witnessed huge success. However, these techniques often struggle to adapt behavior to new preferences or changes in the environment. To address these limitations, we propose Fine-tuning Diffusion Policy with Human Preference (FDPP). FDPP learns a reward function through preference-based learning. This reward is then used to fine-tune the pre-trained policy with reinforcement learning (RL), resulting in alignment of pre-trained policy with new human preferences while still solving the original task. Our experiments across various robotic tasks and preferences demonstrate that FDPP effectively customizes policy behavior without compromising performance. Additionally, we show that incorporating Kullback-Leibler (KL) regularization during fine-tuning prevents over-fitting and helps maintain the competencies of the initial policy.

关键词

cs.ROcs.LG

相关论文

查看 MANIPULATION 分类全部论文