A dynamical clipping approach with task feedback for Proximal Policy Optimization
Ziqi Zhang, Jingzehua Xu, Zifeng Zhuang, Hongyin Zhang, Jinxin Liu, Donglin wang, Shuai Zhang
- 发表年份
- 2023
- 访问权限
- 开放获取
摘要
Proximal Policy Optimization (PPO) has been broadly applied to robotics learning, showcasing stable training performance. However, the fixed clipping bound setting may limit the performance of PPO. Specifically, there is no theoretical proof that the optimal clipping bound remains consistent throughout the entire training process. Meanwhile, previous researches suggest that a fixed clipping bound restricts the policy's ability to explore. Therefore, many past studies have aimed to dynamically adjust the PPO clipping bound to enhance PPO's performance. However, the objective of these approaches are not directly aligned with the objective of reinforcement learning (RL) tasks, which is to maximize the cumulative Return. Unlike previous clipping approaches, we propose a bi-level proximal policy optimization objective that can dynamically adjust the clipping bound to better reflect the preference (maximizing Return) of these RL tasks. Based on this bi-level proximal policy optimization paradigm, we introduce a new algorithm named Preference based Proximal Policy Optimization (Pb-PPO). Pb-PPO utilizes a multi-armed bandit approach to refelect RL preference, recommending the clipping bound for PPO that can maximizes the current Return. Therefore, Pb-PPO results in greater stability and improved performance compared to PPO with a fixed clipping bound. We test Pb-PPO on locomotion benchmarks across multiple environments, including Gym-Mujoco and legged-gym. Additionally, we validate Pb-PPO on customized navigation tasks. Meanwhile, we conducted comparisons with PPO using various fixed clipping bounds and various of clipping approaches. The experimental results indicate that Pb-PPO demonstrates superior training performance compared to PPO and its variants. Our codebase has been released at : https://github.com/stevezhangzA/pb_ppo
关键词
相关论文
基于非线性滑模模型预测控制与自适应跟随转向及动静态约束的六轮独立驱动/四轮独立转向无人地面车辆轨迹跟踪控制
Shengyang Lu, Guanpeng Chen, Lijing Zhao 等 5 位作者
Robotics and Autonomous Systems · 2026
仿生水下机器人:材料、设计、控制与应用进展
Dilip Muchhala, Pramod Kumar Maurya, Adarsh Raut 等 6 位作者
Robotics and Autonomous Systems · 2026
刚柔混合连杆人形机器人的建模与控制
Zewen He, Taiki Ishigaki, Ko Yamamoto
Robotics and Autonomous Systems · 2026
人-外骨骼-助行器系统的人工推动自适应协调控制
Xinhao Zhang, Chen Yang, Chaobin Zou 等 7 位作者
Robotics and Autonomous Systems · 2026