Iteratively Learn Diverse Strategies with State Distance Information
Wei Fu, Weihua Du, Jingwei Li, Sunli Chen, Jingzhao Zhang, Yi Wu
- 发表年份
- 2023
- 访问权限
- 开放获取
摘要
In complex reinforcement learning (RL) problems, policies with similar rewards may have substantially different behaviors. It remains a fundamental challenge to optimize rewards while also discovering as many diverse strategies as possible, which can be crucial in many practical applications. Our study examines two design choices for tackling this challenge, i.e., diversity measure and computation framework. First, we find that with existing diversity measures, visually indistinguishable policies can still yield high diversity scores. To accurately capture the behavioral difference, we propose to incorporate the state-space distance information into the diversity measure. In addition, we examine two common computation frameworks for this problem, i.e., population-based training (PBT) and iterative learning (ITR). We show that although PBT is the precise problem formulation, ITR can achieve comparable diversity scores with higher computation efficiency, leading to improved solution quality in practice. Based on our analysis, we further combine ITR with two tractable realizations of the state-distance-based diversity measures and develop a novel diversity-driven RL algorithm, State-based Intrinsic-reward Policy Optimization (SIPO), with provable convergence properties. We empirically examine SIPO across three domains from robot locomotion to multi-agent games. In all of our testing environments, SIPO consistently produces strategically diverse and human-interpretable policies that cannot be discovered by existing baselines.
关键词
相关论文
基于非线性滑模模型预测控制与自适应跟随转向及动静态约束的六轮独立驱动/四轮独立转向无人地面车辆轨迹跟踪控制
Shengyang Lu, Guanpeng Chen, Lijing Zhao 等 5 位作者
Robotics and Autonomous Systems · 2026
仿生水下机器人:材料、设计、控制与应用进展
Dilip Muchhala, Pramod Kumar Maurya, Adarsh Raut 等 6 位作者
Robotics and Autonomous Systems · 2026
刚柔混合连杆人形机器人的建模与控制
Zewen He, Taiki Ishigaki, Ko Yamamoto
Robotics and Autonomous Systems · 2026
人-外骨骼-助行器系统的人工推动自适应协调控制
Xinhao Zhang, Chen Yang, Chaobin Zou 等 7 位作者
Robotics and Autonomous Systems · 2026