首页 /研究 /MSVIPER: Improved Policy Distillation for Reinforcement-Learning-Based Robot Navigation

LEARNING

MSVIPER: Improved Policy Distillation for Reinforcement-Learning-Based Robot Navigation

Aaron M. Roth, Jing Liang, Ram Sriram, Elham Tabassi, Dinesh Manocha

发表年份: 2022
访问权限: 开放获取

摘要

We present Multiple Scenario Verifiable Reinforcement Learning via Policy Extraction (MSVIPER), a new method for policy distillation to decision trees for improved robot navigation. MSVIPER learns an "expert" policy using any Reinforcement Learning (RL) technique involving learning a state-action mapping and then uses imitation learning to learn a decision-tree policy from it. We demonstrate that MSVIPER results in efficient decision trees and can accurately mimic the behavior of the expert policy. Moreover, we present efficient policy distillation and tree-modification techniques that take advantage of the decision tree structure to allow improvements to a policy without retraining. We use our approach to improve the performance of RL-based robot navigation algorithms for indoor and outdoor scenes. We demonstrate the benefits in terms of reduced freezing and oscillation behaviors (by up to 95\% reduction) for mobile robots navigating among dynamic obstacles and reduced vibrations and oscillation (by up to 17\%) for outdoor robot navigation on complex, uneven terrains.

关键词

cs.ROcs.AIcs.HCcs.LG

MSVIPER: Improved Policy Distillation for Reinforcement-Learning-Based Robot Navigation

摘要

关键词

相关论文

The Organization of Behavior

Fractional Brownian Motions, Fractional Noises and Applications

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A guide to deep learning in healthcare