Explaining Metastable Cooperation in Independent Multi-Agent Boltzmann 
Q-Learning—A Deterministic Approximation

Goll, David; Barfuss, Wolfram; Heitzig, Jobst

doi:10.3390/app16073524

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Journal Article

Explaining Metastable Cooperation in Independent Multi-Agent Boltzmann Q-Learning—A Deterministic Approximation

Authors

Goll, David
External Organizations;

Barfuss, Wolfram
External Organizations;

/persons/resource/heitzig

Heitzig, Jobst
Potsdam Institute for Climate Impact Research;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

Explaining_Metastable_Cooperation_in_Independent_Multi_Agent_Boltzmann_Q_Learning___A_Deterministic_Approximation.pdf
(Publisher version), 10MB

Supplementary Material (public)

There is no public supplementary material available

Citation

Goll, D., Barfuss, W., Heitzig, J. (2026): Explaining Metastable Cooperation in Independent Multi-Agent Boltzmann Q-Learning—A Deterministic Approximation. - Applied Sciences, 16, 7, 3524.
https://doi.org/10.3390/app16073524

Cite as: https://publications.pik-potsdam.de/pubman/item/item_34231

Abstract

Multi-agent reinforcement learning involves interacting agents whose learning processes are coupled through a shared environment. This work introduces a discrete-time approximation model for multi-agent Boltzmann Q-learning that accounts for agents’ update frequencies. We demonstrate why previous models do not accurately represent the actual stochastic learning dynamics while our model can reproduce several complex emergent dynamic regimes, including transient cooperation and metastable states in social dilemmas like the Prisoner’s Dilemma. We show that increasing the discount factor can prevent convergence by inducing oscillations through a supercritical Neimark–Sacker bifurcation, which transforms the unique stable fixed point into a stable limit cycle. This analysis provides a deeper understanding of the complexities of multi-agent learning dynamics and the conditions under which convergence may not be achieved.