Home /Research /Visual Explanation using Attention Mechanism in Actor-Critic-based Deep Reinforcement Learning

LEARNING

Visual Explanation using Attention Mechanism in Actor-Critic-based Deep Reinforcement Learning

Hidenori Itaya, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi, Komei Sugiura

Year: 2021
Citations: 22

Abstract

Deep reinforcement learning (DRL) has great potential for acquiring the optimal action in complex environments such as games and robot control. However, it is difficult to analyze the decision-making of the agent, i.e., the reasons it selects the action acquired by learning. In this work, we propose Mask-Attention A3C (Mask A3C), which introduces an attention mechanism into Asynchronous Advantage Actor-Critic (A3C), which is an actor-critic-based DRL method, and can analyze the decision-making of an agent in DRL. A3C consists of a feature extractor that extracts features from an image, a policy branch that outputs the policy, and a value branch that outputs the state value. In this method, we focus on the policy and value branches and introduce an attention mechanism into them. The attention mechanism applies a mask processing to the feature maps of each branch using mask-attention that expresses the judgment reason for the policy and state value with a heat map. We visualized mask-attention maps for games on the Atari 2600 and found we could easily analyze the reasons behind an agent's decision-making in various game tasks. Furthermore, experimental results showed that the agent could achieve a higher performance by introducing the attention mechanism.

Keywords

Reinforcement learningComputer scienceArtificial intelligenceAsynchronous communicationFeature (linguistics)Value networkAction (physics)Mechanism (biology)Focus (optics)Value (mathematics)

Visual Explanation using Attention Mechanism in Actor-Critic-based Deep Reinforcement Learning

Abstract

Keywords

Related papers

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory