PrefCLM: Enhancing Preference-Based Reinforcement Learning With Crowdsourced Large Language Models
Ruiqi Wang, Dezhong Zhao, Ziqin Yuan, Ike Obi, Byung‐Cheol Min
- Year
- 2025
- Citations
- 9
Abstract
Preference-based reinforcement learning (PbRL) is emerging as a promising approach to teaching robots through human comparative feedback without complex reward engineering. However, the substantial volume of human feedback required hinders broader applications. In this work, we introduce PrefCLM, a novel framework that utilizes crowdsourced large language models (LLMs) as synthetic teachers in PbRL. We utilize Dempster-Shafer Theory to fuse individual preference beliefs from multiple LLM agents at the score level, efficiently leveraging their diversity and collective intelligence. We also introduce a human-in-the-loop pipeline, enabling iterative and collective refinements that adapt to the nuanced and individualized preferences inherent to human-robot interaction (HRI) scenarios. Experimental results across various general RL tasks show that PrefCLM achieves competitive performance compared to expert-engineered scripted teachers and excels in facilitating more natural and efficient behaviors. A real-world user study (N = 10) further demonstrates its capability to tailor robot behaviors to individual user preferences, enhancing user satisfaction in HRI scenarios.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Fractional Differential Equations
Igor Podlubný
2025
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991