Perspectives for Direct Interpretability in Multi-Agent Deep Reinforcement Learning
Yoann Poupart, Aurélie Beynier, Nicolas Maudet
- Year
- 2025
- Access
- Open access
Abstract
Multi-Agent Deep Reinforcement Learning (MADRL) was proven efficient in solving complex problems in robotics or games, yet most of the trained models are hard to interpret. While learning intrinsically interpretable models remains a prominent approach, its scalability and flexibility are limited in handling complex tasks or multi-agent dynamics. This paper advocates for direct interpretability, generating post hoc explanations directly from trained models, as a versatile and scalable alternative, offering insights into agents' behaviour, emergent phenomena, and biases without altering models' architectures. We explore modern methods, including relevance backpropagation, knowledge edition, model steering, activation patching, sparse autoencoders and circuit discovery, to highlight their applicability to single-agent, multi-agent, and training process challenges. By addressing MADRL interpretability, we propose directions aiming to advance active topics such as team identification, swarm coordination and sample efficiency.
Keywords
Related papers
Hierarchical decision-making for UAVs’ game via LLM enhanced multi-agent reinforcement learning
Xinyu Dong, Bo Li, Guangyu Zhang +2 more
Aerospace Science and Technology · 2026
Dynamic reconfiguration in multi-robot agent systems using embedded language models
Shokhikha Amalana Murdivien, Jongsu Park, Jumyung Um
Robotics and Computer-Integrated Manufacturing · 2026
Formation optimization and obstacle avoidance decision-making methods for cooperative coverage search of multi-UUVs in underwater wreck areas
Haomiao Yu, Zeyuan Zhang, Yantian Ma
Robotics and Autonomous Systems · 2026
Human-in-the-Loop Swarms: A Bionic Swarm Approach to Real-World Soil Mapping
Petras Swissler, Mohammadali Rashidioun, Nicholas Sahu +3 more
2026