English
 
Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Journal Article

Explaining Metastable Cooperation in Independent Multi-Agent Boltzmann Q-Learning—A Deterministic Approximation

Authors

Goll,  David
External Organizations;

Barfuss,  Wolfram
External Organizations;

/persons/resource/heitzig

Heitzig,  Jobst       
Potsdam Institute for Climate Impact Research;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Supplementary Material (public)
There is no public supplementary material available
Citation

Goll, D., Barfuss, W., Heitzig, J. (2026): Explaining Metastable Cooperation in Independent Multi-Agent Boltzmann Q-Learning—A Deterministic Approximation. - Applied Sciences, 16, 7, 3524.
https://doi.org/10.3390/app16073524


Cite as: https://publications.pik-potsdam.de/pubman/item/item_34231
Abstract
Multi-agent reinforcement learning involves interacting agents whose learning processes are coupled through a shared environment. This work introduces a discrete-time approximation model for multi-agent Boltzmann Q-learning that accounts for agents’ update frequencies. We demonstrate why previous models do not accurately represent the actual stochastic learning dynamics while our model can reproduce several complex emergent dynamic regimes, including transient cooperation and metastable states in social dilemmas like the Prisoner’s Dilemma. We show that increasing the discount factor can prevent convergence by inducing oscillations through a supercritical Neimark–Sacker bifurcation, which transforms the unique stable fixed point into a stable limit cycle. This analysis provides a deeper understanding of the complexities of multi-agent learning dynamics and the conditions under which convergence may not be achieved.