首页 /研究 /Convex Markov Games: A New Frontier for Multi-Agent Reinforcement Learning

LEARNING

Convex Markov Games: A New Frontier for Multi-Agent Reinforcement Learning

Ian Gemp, Andreas Haupt, Luke Marris, Siqi Liu, Georgios Piliouras

发表年份: 2024
访问权限: 开放获取

摘要

Behavioral diversity, expert imitation, fairness, safety goals and others give rise to preferences in sequential decision making domains that do not decompose additively across time. We introduce the class of convex Markov games that allow general convex preferences over occupancy measures. Despite infinite time horizon and strictly higher generality than Markov games, pure strategy Nash equilibria exist. Furthermore, equilibria can be approximated empirically by performing gradient descent on an upper bound of exploitability. Our experiments reveal novel solutions to classic repeated normal-form games, find fair solutions in a repeated asymmetric coordination game, and prioritize safe long-term behavior in a robot warehouse environment. In the prisoner's dilemma, our algorithm leverages transient imitation to find a policy profile that deviates from observed human play only slightly, yet achieves higher per-player utility while also being three orders of magnitude less exploitable.

关键词

cs.GTcs.AIcs.MA

Convex Markov Games: A New Frontier for Multi-Agent Reinforcement Learning

摘要

关键词

相关论文

The Organization of Behavior

Fractional Brownian Motions, Fractional Noises and Applications

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A guide to deep learning in healthcare