Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning
Shentong Mo
- 发表年份
- 2026
- 访问权限
- 开放获取
摘要
Designing effective reward functions is a cornerstone of reinforcement learning (RL), yet it remains a challenging and labor-intensive process due to the inefficiencies and inconsistencies inherent in traditional methods. Existing methods often rely on extensive manual design and evaluation steps, which are prone to redundancy and overlook local uncertainties at intermediate decision points. To address these challenges, we propose the Chain of Uncertain Rewards (CoUR), a novel framework that integrates large language models (LLMs) to streamline reward function design and evaluation in RL environments. Specifically, our CoUR introduces code uncertainty quantification with a similarity selection mechanism that combines textual and semantic analyses to identify and reuse the most relevant reward function components. By reducing redundant evaluations and leveraging Bayesian optimization on decoupled reward terms, CoUR enables a more efficient and robust search for optimal reward feedback. We comprehensively evaluate CoUR across nine original environments from IsaacGym and all 20 tasks from the Bidexterous Manipulation benchmark. The experimental results demonstrate that CoUR not only achieves better performance but also significantly lowers the cost of reward evaluations.
关键词
相关论文
面向大型复杂构件的移动机器人辅助磨削技术综述
Yusen Li, Ziwei Wang, Xiangye Zhu 等 12 位作者
Robotics and Computer-Integrated Manufacturing · 2026
基于物理信息与机器学习的五轴铣削TC4钛合金刀具磨损融合预测模型
Shaoqing Qin, Lida Zhu, Yanpeng Hao 等 10 位作者
Robotics and Computer-Integrated Manufacturing · 2026
通过新型压电主动阻尼刀柄提升机器人铣削质量
Bo Li, Yuanbo Zhao, Huijie Xiao 等 6 位作者
Robotics and Computer-Integrated Manufacturing · 2026
一种利用磁致非线性宽带多向被动减振器抑制机器人铣削低频颤振的新方法
Hao Li, Yuhui Yu, Rui Fu 等 6 位作者
Robotics and Computer-Integrated Manufacturing · 2026