Multi-agent Markov Entanglement

Shuze Chen; Tianyi Peng

Back to NeurIPS

NeurIPS 2025

Multi-agent Markov Entanglement

Conference Paper Main Conference Track Artificial Intelligence · Machine Learning

PDF Details

Abstract

Value decomposition has long been a fundamental technique in multi-agent reinforcement learning and dynamic programming. Specifically, the value function of a global state $(s_1, s_2, \ldots, s_N)$ is often approximated as the sum of local functions: $V(s_1, s_2, \ldots, s_N)\approx\sum_{i=1}^N V_i(s_i)$. This approach has found various applications in modern RL systems. However, the theoretical justification for why this decomposition works so effectively remains underexplored. In this paper, we uncover the underlying mathematical structure that enables value decomposition. We demonstrate that a Markov decision process (MDP) permits value decomposition *if and only if* its transition matrix is not "entangled"—a concept analogous to quantum entanglement in quantum physics. Drawing inspiration from how physicists measure quantum entanglement, we introduce how to measure the "Markov entanglement" and show that this measure can be used to bound the decomposition error in general multi-agent MDPs. Using the concept of Markov entanglement, we proved that a widely-used class of policies, the index policy, is weakly-entangled and enjoys a sublinear $\mathcal O(\sqrt{N})$ scale of decomposition error for $N$-agent systems. Finally, we show Markov entanglement can be efficiently estimated, guiding practitioners on the feasibility of value decomposition.

Multi-agent Markov Entanglement

Abstract

Authors

Keywords

Context