Home /Research /Generalized Tsallis Entropy Reinforcement Learning and Its Application to Soft Mobile Robots
LEARNING

Generalized Tsallis Entropy Reinforcement Learning and Its Application to Soft Mobile Robots

Kyungjae Lee, Sungyub Kim, Sungbin Lim, Sungjoon Choi, Mineui Hong, Jaein Kim, Yong‐Lae Park, Songhwai Oh

Year
2020
Citations
20
Access
Open access

Abstract

In this paper, we present a new class of entropyregularized Markov decision processes (MDPs), which will be referred to as Tsallis MDPs. that inherently generalize wellknown maximum entropy reinforcement learning (RL) by introducing an additional real-valued parameter called an entropic index. Our theoretical result enables us to derive and analyze different types of optimal policies with interesting properties relate to the stochasticity of the optimal policy by controlling the entropic index. To handle complex and model-free problems, such as learning a controller for a soft mobile robot, we propose a Tsallis actor-critic (TAC) method. We first observe that different RL problems have different desirable entropic indices where using proper entropic index results in superior performance compared to the state-of-the-art actor-critic methods. To mitigate the exhaustive search of the entropic index, we propose a quickand-dirty curriculum method of gradually increasing the entropic index which will be referred to as TAC with Curricula (TAC 2 ). TAC 2 shows comparable performance to TAC with the optimal entropic index. Finally, We apply TAC 2 to learn a controller of a soft mobile robot where TAC 2 outperforms existing actor-critic methods in terms of both convergence speed and utility.

Keywords

Reinforcement learningTsallis entropyComputer scienceMobile robotEntropy (arrow of time)RobotArtificial intelligenceMachine learningPattern recognition (psychology)Physics

Related papers

Browse all LEARNING papers