Cooperative Learning of Multi-Agent Systems via Reinforcement Learning

Wang, Xin; Zhao, Chen; Huang, Tingwen; Chakrabarti, Prasun; Kurths, Jürgen

doi:10.1109/TSIPN.2023.3239654

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Journal Article

Cooperative Learning of Multi-Agent Systems via Reinforcement Learning

Authors

Wang, Xin
External Organizations;

Zhao, Chen
External Organizations;

Huang, Tingwen
External Organizations;

Chakrabarti, Prasun
External Organizations;

/persons/resource/Juergen.Kurths

Kurths, Jürgen
Potsdam Institute for Climate Impact Research;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Wang, X., Zhao, C., Huang, T., Chakrabarti, P., Kurths, J. (2023): Cooperative Learning of Multi-Agent Systems via Reinforcement Learning. - IEEE Transactions on Signal and Information Processing over Networks, 9, 13-23.
https://doi.org/10.1109/TSIPN.2023.3239654

Cite as: https://publications.pik-potsdam.de/pubman/item/item_28321

Abstract

In many specific scenarios, accurateand practical cooperative learning is a commonly encountered challenge in multi-agent systems. Thus, the current investigation focuses on cooperative learning algorithms for multi-agent systems and underpins an alternate data-based neural network reinforcement learning framework. To achieve the data-based learning optimization, the proposed cooperative learning framework, which comprises two layers, introduces a virtual learning objective. The followers learn the behaviors of the virtual objects in the first layer based on the adaptive neural networks (NNs). Specifically, the actor and critic NNs are applied to acquire cooperative behaviors and assess this layer's long-term utility function. Then another layer realizes the tracking performance between the virtual objects and the leader by introducing the local data-based performance index. Then, we formulate a resulting deterministic optimization problem and resolve it effectively with the policy iteration algorithm. This intuitive cooperative learning algorithm also preserves good robustness properties and eliminates the dependence on the prior knowledge of the multi-agent system model in the solution process. Finally, a multi-robot formation system demonstrates this promising development's practical appeal and highly effective outcome.