hide
Free keywords:
-
Abstract:
Multi-agent reinforcement learning involves interacting agents whose learning processes are coupled through a shared environment. This work introduces a discrete-time approximation model for multi-agent Boltzmann Q-learning that accounts for agents’ update frequencies. We demonstrate why previous models do not accurately represent the actual stochastic learning dynamics while our model can reproduce several complex emergent dynamic regimes, including transient cooperation and metastable states in social dilemmas like the Prisoner’s Dilemma. We show that increasing the discount factor can prevent convergence by inducing oscillations through a supercritical Neimark–Sacker bifurcation, which transforms the unique stable fixed point into a stable limit cycle. This analysis provides a deeper understanding of the complexities of multi-agent learning dynamics and the conditions under which convergence may not be achieved.