Reinforcement Learning via Recurrent Convolutional Neural Networks

Tanmay Shankar, Santosha K. Dwivedy, Prithwijit Guha

发表年份: 2016
引用次数: 6

摘要

Deep Reinforcement Learning has enabled the learning of policies for complex tasks in partially observable environments, without explicitly learning the underlying model of the tasks. While such model-free methods do achieve considerable performance, they often ignore the structure of task. We present a more natural representation of the solutions to Reinforcement Learning (RL) problems, within 3 Recurrent Convolutional Neural Network (RCNN) architectures to better exploit this inherent structure. The forward passes of each RCNN execute an efficient Value Iteration, propagate beliefs of state in partially observable environments, and choose optimal actions respectively. Applying back-propagation to these RCNNs allows the system to explicitly learn the Transition Model and Reward Function associated with the underlying MDP, serving as an elegant alternative to classical model-based RL. We evaluate the proposed algorithms in simulation, considering a robot planning problem. We demonstrate the capability of our framework to reduce the cost of re-planning, learn accurate MDP models, and finally re-plan with learned models to achieve near-optimal policies.

关键词

Reinforcement learningComputer scienceBellman equationExploitArtificial intelligenceFunction (biology)Representation (politics)Task (project management)ObservableConvolutional neural network

Reinforcement Learning via Recurrent Convolutional Neural Networks

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory