Generalized Tsallis Entropy Reinforcement Learning and Its Application to Soft Mobile Robots
Kyungjae Lee, Sungyub Kim, Sungbin Lim, Sungjoon Choi, Mineui Hong, Jaein Kim, Yong‐Lae Park, Songhwai Oh
- Year
- 2020
- Citations
- 20
- Access
- Open access
Abstract
In this paper, we present a new class of entropyregularized Markov decision processes (MDPs), which will be referred to as Tsallis MDPs. that inherently generalize wellknown maximum entropy reinforcement learning (RL) by introducing an additional real-valued parameter called an entropic index. Our theoretical result enables us to derive and analyze different types of optimal policies with interesting properties relate to the stochasticity of the optimal policy by controlling the entropic index. To handle complex and model-free problems, such as learning a controller for a soft mobile robot, we propose a Tsallis actor-critic (TAC) method. We first observe that different RL problems have different desirable entropic indices where using proper entropic index results in superior performance compared to the state-of-the-art actor-critic methods. To mitigate the exhaustive search of the entropic index, we propose a quickand-dirty curriculum method of gradually increasing the entropic index which will be referred to as TAC with Curricula (TAC 2 ). TAC 2 shows comparable performance to TAC with the optimal entropic index. Finally, We apply TAC 2 to learn a controller of a soft mobile robot where TAC 2 outperforms existing actor-critic methods in terms of both convergence speed and utility.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002