首页 /研究 /Interpretable Policy Specification and Synthesis through Natural Language and RL
LEARNING

Interpretable Policy Specification and Synthesis through Natural Language and RL

Pradyumna Tambwekar, Andrew Silva, Nakul Gopalan, Matthew Gombolay

发表年份
2021
引用次数
4

摘要

Policy specification is a process by which a human can initialize a robot's behaviour and, in turn, warm-start policy optimization via Reinforcement Learning (RL). While policy specification/design is inherently a collaborative process, modern methods based on Learning from Demonstration or Deep RL lack the model interpretability and accessibility to be classified as such. Current state-of-the-art methods for policy specification rely on black-box models, which are an insufficient means of collaboration for non-expert users: These models provide no means of inspecting policies learnt by the agent and are not focused on creating a usable modality for teaching robot behaviour. In this paper, we propose a novel machine learning framework that enables humans to 1) specify, through natural language, interpretable policies in the form of easy-to-understand decision trees, 2) leverage these policies to warm-start reinforcement learning and 3) outperform baselines that lack our natural language initialization mechanism. We train our approach by collecting a first-of-its-kind corpus mapping free-form natural language policy descriptions to decision tree-based policies. We show that our novel framework translates natural language to decision trees with a 96% and 97% accuracy on a held-out corpus across two domains, respectively. Finally, we validate that policies initialized with natural language commands are able to significantly outperform relevant baselines (p < 0.001) that do not benefit from our natural language-based warm-start technique.

关键词

Computer scienceReinforcement learningInterpretabilityNatural languageArtificial intelligenceLeverage (statistics)Machine learningInitializationUSableNatural language understanding

相关论文

查看 LEARNING 分类全部论文