Vision-Based Hierarchical Reinforcement Learning for Quadrotor UAV Navigation

Sun, Qiyu; Ji, Jiaxin; Mu, Jinzhen; Xu, Jing; Kocarev, Ljupco; Kurths, Jürgen

doi:10.1109/TMECH.2025.3596019

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Journal Article

Vision-Based Hierarchical Reinforcement Learning for Quadrotor UAV Navigation

Authors

Sun, Qiyu
External Organizations;

Ji, Jiaxin
External Organizations;

Mu, Jinzhen
External Organizations;

Xu, Jing
External Organizations;

Kocarev, Ljupco
External Organizations;

/persons/resource/Juergen.Kurths

Kurths, Jürgen
Potsdam Institute for Climate Impact Research;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Sun, Q., Ji, J., Mu, J., Xu, J., Kocarev, L., Kurths, J. (2025): Vision-Based Hierarchical Reinforcement Learning for Quadrotor UAV Navigation. - IEEE/ASME Transactions on Mechatronics, 30, 6, 4154-4164.
https://doi.org/10.1109/TMECH.2025.3596019

Cite as: https://publications.pik-potsdam.de/pubman/item/item_33877

Abstract

Vision-based reinforcement learning (RL) methods enable efficient policy learning and adaptive decision-making for quadrotor uncrewed aerial vehicles (UAVs) navigation in complex, high-dimensional flight environments. Although end-to-end vision-based RL approaches are effective, they often function as closed-box models, lacking interpretability. We develop an explainable vision-based hierarchical RL algorithm for QUAV navigation, integrating perception, obstacle avoidance, and motion control into a unified framework. Due to the high-dimensional state space and complex dynamics of QUAV tasks, traditional RL methods often suffer from sparse and difficult-to-obtain rewards. To address this, we introduce the echoic hindsight experience replay mechanism, which accelerates convergence by transforming failed episodes into successful ones. Building on this, we propose an RL-based proportional-integral-derivative-retarded control method that leverages multirate measurements to enhance low-level control performance, improving maneuverability and precision in QUAV operations. Both simulated and real-world experiments demonstrate the effectiveness of our proposed method for UAV navigation in complex environments.