Post-Convergence Sim-to-Real Policy Transfer: A Principled Alternative to Cherry-Picking
Dylan Khor, Bowen Weng
- 发表年份
- 2025
- 访问权限
- 开放获取
摘要
Learning-based approaches, particularly reinforcement learning (RL), have become widely used for developing control policies for autonomous agents, such as locomotion policies for legged robots. RL training typically maximizes a predefined reward (or minimizes a corresponding cost/loss) by iteratively optimizing policies within a simulator. Starting from a randomly initialized policy, the empirical expected reward follows a trajectory with an overall increasing trend. While some policies become temporarily stuck in local optima, a well-defined training process generally converges to a reward level with noisy oscillations. However, selecting a policy for real-world deployment is rarely an analytical decision (i.e., simply choosing the one with the highest reward) and is instead often performed through trial and error. To improve sim-to-real transfer, most research focuses on the pre-convergence stage, employing techniques such as domain randomization, multi-fidelity training, adversarial training, and architectural innovations. However, these methods do not eliminate the inevitable convergence trajectory and noisy oscillations of rewards, leading to heuristic policy selection or cherry-picking. This paper addresses the post-convergence sim-to-real transfer problem by introducing a worst-case performance transference optimization approach, formulated as a convex quadratic-constrained linear programming problem. Extensive experiments demonstrate its effectiveness in transferring RL-based locomotion policies from simulation to real-world laboratory tests.
关键词
相关论文
基于非线性滑模模型预测控制与自适应跟随转向及动静态约束的六轮独立驱动/四轮独立转向无人地面车辆轨迹跟踪控制
Shengyang Lu, Guanpeng Chen, Lijing Zhao 等 5 位作者
Robotics and Autonomous Systems · 2026
仿生水下机器人:材料、设计、控制与应用进展
Dilip Muchhala, Pramod Kumar Maurya, Adarsh Raut 等 6 位作者
Robotics and Autonomous Systems · 2026
刚柔混合连杆人形机器人的建模与控制
Zewen He, Taiki Ishigaki, Ko Yamamoto
Robotics and Autonomous Systems · 2026
人-外骨骼-助行器系统的人工推动自适应协调控制
Xinhao Zhang, Chen Yang, Chaobin Zou 等 7 位作者
Robotics and Autonomous Systems · 2026