Home /Research /PrefCLM: Enhancing Preference-Based Reinforcement Learning With Crowdsourced Large Language Models

HRI

PrefCLM: Enhancing Preference-Based Reinforcement Learning With Crowdsourced Large Language Models

Ruiqi Wang, Dezhong Zhao, Ziqin Yuan, Ike Obi, Byung‐Cheol Min

Year: 2025
Citations: 9

Abstract

Preference-based reinforcement learning (PbRL) is emerging as a promising approach to teaching robots through human comparative feedback without complex reward engineering. However, the substantial volume of human feedback required hinders broader applications. In this work, we introduce PrefCLM, a novel framework that utilizes crowdsourced large language models (LLMs) as synthetic teachers in PbRL. We utilize Dempster-Shafer Theory to fuse individual preference beliefs from multiple LLM agents at the score level, efficiently leveraging their diversity and collective intelligence. We also introduce a human-in-the-loop pipeline, enabling iterative and collective refinements that adapt to the nuanced and individualized preferences inherent to human-robot interaction (HRI) scenarios. Experimental results across various general RL tasks show that PrefCLM achieves competitive performance compared to expert-engineered scripted teachers and excels in facilitating more natural and efficient behaviors. A real-world user study (N = 10) further demonstrates its capability to tailor robot behaviors to individual user preferences, enhancing user satisfaction in HRI scenarios.

Keywords

PreferenceReinforcement learningReinforcementComputer scienceArtificial intelligencePsychologySocial psychologyMathematics

PrefCLM: Enhancing Preference-Based Reinforcement Learning With Crowdsourced Large Language Models

Abstract

Keywords

Related papers

Statistical Learning Theory

Artificial intelligence: a modern approach

Fractional Differential Equations

Applied Nonlinear Control