Discovering Diverse Solutions in Deep Reinforcement Learning by Maximizing State-Action-Based Mutual Information
Takayuki Osa, Voot Tangkaratt, Masashi Sugiyama
- Year
- 2021
- Access
- Open access
Abstract
Reinforcement learning algorithms are typically limited to learning a single solution for a specified task, even though diverse solutions often exist. Recent studies showed that learning a set of diverse solutions is beneficial because diversity enables robust few-shot adaptation. Although existing methods learn diverse solutions by using the mutual information as unsupervised rewards, such an approach often suffers from the bias of the gradient estimator induced by value function approximation. In this study, we propose a novel method that can learn diverse solutions without suffering the bias problem. In our method, a policy conditioned on a continuous or discrete latent variable is trained by directly maximizing the variational lower bound of the mutual information, instead of using the mutual information as unsupervised rewards as in previous studies. Through extensive experiments on robot locomotion tasks, we demonstrate that the proposed method successfully learns an infinite set of diverse solutions by learning continuous latent variables, which is more challenging than learning a finite number of solutions. Subsequently, we show that our method enables more effective few-shot adaptation compared with existing methods.
Keywords
Related papers
Trajectory tracking control for 6WID/4WIS UGV via nonlinear sliding mode-model predictive control with adaptive following steering and dynamic-static constraints
Shengyang Lu, Guanpeng Chen, Lijing Zhao +2 more
Robotics and Autonomous Systems · 2026
Bioinspired underwater robotics: Advances across the materials, design, control, and applications
Dilip Muchhala, Pramod Kumar Maurya, Adarsh Raut +3 more
Robotics and Autonomous Systems · 2026
Modeling and control of a rigid–soft hybrid-link humanoid robot
Zewen He, Taiki Ishigaki, Ko Yamamoto
Robotics and Autonomous Systems · 2026
Artificial pushing adaptive coordinated control for the human-exoskeleton-walker system
Xinhao Zhang, Chen Yang, Chaobin Zou +4 more
Robotics and Autonomous Systems · 2026