Learning Near-Optimal Broadcasting Intervals in Decentralized Multi-Agent Systems using Online Least-Square Policy Iteration

Palunko, Ivana; Tolić, Domagoj; Prkačin, Vicko

izvor podataka: crosbi ✓

Learning Near-Optimal Broadcasting Intervals in Decentralized Multi-Agent Systems using Online Least-Square Policy Iteration (CROSBI ID 291261)

Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija

Palunko, Ivana ; Tolić, Domagoj ; Prkačin, Vicko Learning Near-Optimal Broadcasting Intervals in Decentralized Multi-Agent Systems using Online Least-Square Policy Iteration // Iet control theory and applications, 15 (2021), 8; 1054-1067. doi: 10.1049/cth2.12102

Podaci o odgovornosti

Autori

Palunko, Ivana ; Tolić, Domagoj ; Prkačin, Vicko

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

Learning Near-Optimal Broadcasting Intervals in Decentralized Multi-Agent Systems using Online Least-Square Policy Iteration

Sažetak

Here, agents learn how often to exchange information with neighbours in cooperative multi‐ agent systems (MASs) such that their linear quadratic regulator (LQR)‐like performance indices are minimized. The investigated LQR‐like cost functions capture trade‐offs between the energy consumption of each agent and MAS local control performance in the presence of exogenous disturbances, delayed and noisy data. Agent energy consumption is critical for prolonging the MAS mission and is composed of both control (e.g. acceleration, velocity) and communication efforts. Taking provably stabilizing upper bounds on broadcasting intervals as optimization constraints, an online off‐policy model‐free learning algorithm based on least square policy iteration (LSPI) to minimize the cost function of each agent is employed. Consequently, the obtained broadcasting intervals adapt to the most recent information (e.g. delayed and noisy agents' inputs and/or outputs) received from neighbours whilst provably stabilize the MAS. Chebyshev polynomials are utilized as the approximator in the LSPI whereas Kalman filtering handles sampled, corrupted, and delayed data. Subsequently, convergence and near‐optimality of our LSPI scheme are inspected. The proposed methodology is verified experimentally using an inexpensive motion capture system and nano quadrotors.

Ključne riječi

multi-agent systems ; decentralized control ; learning (artificial intelligence) ; optimal control

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o izdanju

Časopis

Iet control theory and applications

Volumen (broj)

15 (8)

Godina

2021.

Stranice rada

1054-1067

Status objave rada

objavljeno

ISSN

1751-8644

e-ISSN

1751-8652

DOI

10.1049/cth2.12102

Povezanost rada

Povezane osobe

Ivana Palunko (autor/i)

Domagoj Tolić (autor/i)

Vicko Prkačin (autor/i)

Povezane ustanove

RIT Croatia (322) (autorova ustanova)

Sveučilište u Dubrovniku (275) (autorova ustanova)

Povezani projekti

Upravljanje dinamičkim sustavima (rezultat rada na projektu)

Područje

Elektrotehnika, Matematika, Računarstvo

Poveznice

doi.org

ietresearch.onlinelibrary.wiley.com

Indeksiranost

Scopus

Current Contents Connect (CCC)

Web of Science Core Collection, Science Citation Index Expanded (WoSCC-SCI-Exp)

Web of Science Core Collection, SCI-Exp, SSCI & A&HCI (WoSCC-SCI-Exp, SSCI, A&HCI)