???ENUM_LANGUAGE_JA???
 
???mainMenu_lnkPrivacyPolicy??? ???mainMenu_lnkPolicy???

???ViewItemPage???


???ENUM_STATE_RELEASED???

???ENUM_GENRE_ARTICLE???

Vision-Based Hierarchical Reinforcement Learning for Quadrotor UAV Navigation

???ViewItemOverview_lblSpecificAuthorsSection???

Sun,  Qiyu
External Organizations;

Ji,  Jiaxin
External Organizations;

Mu,  Jinzhen
External Organizations;

Xu,  Jing
External Organizations;

Kocarev,  Ljupco
External Organizations;

/persons/resource/Juergen.Kurths

Kurths,  Jürgen
Potsdam Institute for Climate Impact Research;

???ViewItemOverview_lblExternalResourceSection???
???ViewItemOverview_noExternalResourcesAvailable???
???ViewItemOverview_lblRestrictedFulltextSection???
???ViewItemOverview_noRestrictedFullTextsAvailable???
???ViewItemOverview_lblFulltextSection???
???ViewItemOverview_noFullTextsAvailable???
???ViewItemOverview_lblSupplementaryMaterialSection???
???ViewItemOverview_noSupplementaryMaterialAvailable???
???ViewItemOverview_lblCitationSection???

Sun, Q., Ji, J., Mu, J., Xu, J., Kocarev, L., Kurths, J. (2025): Vision-Based Hierarchical Reinforcement Learning for Quadrotor UAV Navigation. - IEEE/ASME Transactions on Mechatronics, 30, 6, 4154-4164.
https://doi.org/10.1109/TMECH.2025.3596019


???ViewItemOverview_lblCiteAs???: https://publications.pik-potsdam.de/pubman/item/item_33877
???ViewItemOverview_lblAbstractSection???
Vision-based reinforcement learning (RL) methods enable efficient policy learning and adaptive decision-making for quadrotor uncrewed aerial vehicles (UAVs) navigation in complex, high-dimensional flight environments. Although end-to-end vision-based RL approaches are effective, they often function as closed-box models, lacking interpretability. We develop an explainable vision-based hierarchical RL algorithm for QUAV navigation, integrating perception, obstacle avoidance, and motion control into a unified framework. Due to the high-dimensional state space and complex dynamics of QUAV tasks, traditional RL methods often suffer from sparse and difficult-to-obtain rewards. To address this, we introduce the echoic hindsight experience replay mechanism, which accelerates convergence by transforming failed episodes into successful ones. Building on this, we propose an RL-based proportional-integral-derivative-retarded control method that leverages multirate measurements to enhance low-level control performance, improving maneuverability and precision in QUAV operations. Both simulated and real-world experiments demonstrate the effectiveness of our proposed method for UAV navigation in complex environments.