Explaining Metastable Cooperation in Independent Multi-Agent Boltzmann 
Q-Learning—A Deterministic Approximation

Goll, David; Barfuss, Wolfram; Heitzig, Jobst

doi:10.3390/app16073524

Local TagsRelease HistoryDetailsSummary

Explaining Metastable Cooperation in Independent Multi-Agent Boltzmann Q-Learning—A Deterministic Approximation

Goll, D., Barfuss, W., Heitzig, J. (2026): Explaining Metastable Cooperation in Independent Multi-Agent Boltzmann Q-Learning—A Deterministic Approximation. - Applied Sciences, 16, 7, 3524.
https://doi.org/10.3390/app16073524

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://publications.pik-potsdam.de/pubman/item/item_34231 Version Permalink: https://publications.pik-potsdam.de/pubman/item/item_34231_1

Genre: Journal Article

Files

show Files

hide Files

:

Explaining_Metastable_Cooperation_in_Independent_Multi_Agent_Boltzmann_Q_Learning___A_Deterministic_Approximation.pdf (Publisher version), 10MB

View Save

File Permalink:
https://publications.pik-potsdam.de/pubman/item/item_34231_1/component/file_34232/Explaining_Metastable_Cooperation_in_Independent_Multi_Agent_Boltzmann_Q_Learning___A_Deterministic_Approximation.pdf

Name:
Explaining_Metastable_Cooperation_in_Independent_Multi_Agent_Boltzmann_Q_Learning___A_Deterministic_Approximation.pdf

Description:
-

OA-Status:
Gold

Visibility:
Public

MIME-Type / Checksum:
application/pdf / [MD5]

Technical Metadata:

View

Copyright Date:
-

Copyright Info:
-

License:
https://creativecommons.org/licenses/by/4.0/

Locators

show

Creators

show

hide

Creators:
Goll, David¹, Author
Barfuss, Wolfram¹, Author
Heitzig, Jobst², Author

Affiliations:
1External Organizations, ou_persistent22
2Potsdam Institute for Climate Impact Research, ou_persistent13

Content

show

hide

Free keywords: -

Abstract: Multi-agent reinforcement learning involves interacting agents whose learning processes are coupled through a shared environment. This work introduces a discrete-time approximation model for multi-agent Boltzmann Q-learning that accounts for agents’ update frequencies. We demonstrate why previous models do not accurately represent the actual stochastic learning dynamics while our model can reproduce several complex emergent dynamic regimes, including transient cooperation and metastable states in social dilemmas like the Prisoner’s Dilemma. We show that increasing the discount factor can prevent convergence by inducing oscillations through a supercritical Neimark–Sacker bifurcation, which transforms the unique stable fixed point into a stable limit cycle. This analysis provides a deeper understanding of the complexities of multi-agent learning dynamics and the conditions under which convergence may not be achieved.

Details

show

hide

Language(s): eng - English

Dates: Accepted: 2026-03-31Published Online: 2026-04-03Finally published : 2026-04-03

Publication Status: Finally published

Pages: 20

Publishing info: -

Table of Contents: -

Rev. Type: Peer

Identifiers: DOI: 10.3390/app16073524
MDB-ID: No data to archive
PIKDOMAIN: RD4 - Complexity Science
Organisational keyword: RD4 - Complexity Science
Working Group: Behavioural Game Theory and Interacting Agents
Research topic keyword: Nonlinear Dynamics
Regional keyword: Global
Model / method: Machine Learning
OATYPE: Gold Open Access

Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show

hide

Title: Applied Sciences

Source Genre: Journal, SCI, Scopus, oa

Creator(s):

Affiliations:

Publ. Info: -

Pages: - Volume / Issue: 16 (7) Sequence Number: 3524 Start / End Page: - Identifier: CoNE: https://publications.pik-potsdam.de/cone/journals/resource/applied-sciences
Publisher: MDPI