Vision-Based Hierarchical Reinforcement Learning for Quadrotor UAV Navigation

Sun, Qiyu; Ji, Jiaxin; Mu, Jinzhen; Xu, Jing; Kocarev, Ljupco; Kurths, Jürgen

doi:10.1109/TMECH.2025.3596019

???ViewItemPage???

???ViewItemFull_lblItemActions??????List_lblExportOptions???

???ViewItemFull_lblSubHeaderLocalTags??????ViewItemFull_btnItemVersions??????ViewItemFull_btnItemView??????ViewItemOverview_lblLinkOverviewPage???

???ENUM_STATE_RELEASED???

???ENUM_GENRE_ARTICLE???

Vision-Based Hierarchical Reinforcement Learning for Quadrotor UAV Navigation

???ViewItemOverview_lblSpecificAuthorsSection???

Sun, Qiyu
External Organizations;

Ji, Jiaxin
External Organizations;

Mu, Jinzhen
External Organizations;

Xu, Jing
External Organizations;

Kocarev, Ljupco
External Organizations;

/persons/resource/Juergen.Kurths

Kurths, Jürgen
Potsdam Institute for Climate Impact Research;

???ViewItemOverview_lblExternalResourceSection???

???ViewItemOverview_noExternalResourcesAvailable???

???ViewItemOverview_lblRestrictedFulltextSection???

???ViewItemOverview_noRestrictedFullTextsAvailable???

???ViewItemOverview_lblFulltextSection???

???ViewItemOverview_noFullTextsAvailable???

???ViewItemOverview_lblSupplementaryMaterialSection???

???ViewItemOverview_noSupplementaryMaterialAvailable???

???ViewItemOverview_lblCitationSection???

Sun, Q., Ji, J., Mu, J., Xu, J., Kocarev, L., Kurths, J. (2025): Vision-Based Hierarchical Reinforcement Learning for Quadrotor UAV Navigation. - IEEE/ASME Transactions on Mechatronics, 30, 6, 4154-4164.
https://doi.org/10.1109/TMECH.2025.3596019

???ViewItemOverview_lblCiteAs???: https://publications.pik-potsdam.de/pubman/item/item_33877

???ViewItemOverview_lblAbstractSection???

Vision-based reinforcement learning (RL) methods enable efficient policy learning and adaptive decision-making for quadrotor uncrewed aerial vehicles (UAVs) navigation in complex, high-dimensional flight environments. Although end-to-end vision-based RL approaches are effective, they often function as closed-box models, lacking interpretability. We develop an explainable vision-based hierarchical RL algorithm for QUAV navigation, integrating perception, obstacle avoidance, and motion control into a unified framework. Due to the high-dimensional state space and complex dynamics of QUAV tasks, traditional RL methods often suffer from sparse and difficult-to-obtain rewards. To address this, we introduce the echoic hindsight experience replay mechanism, which accelerates convergence by transforming failed episodes into successful ones. Building on this, we propose an RL-based proportional-integral-derivative-retarded control method that leverages multirate measurements to enhance low-level control performance, improving maneuverability and precision in QUAV operations. Both simulated and real-world experiments demonstrate the effectiveness of our proposed method for UAV navigation in complex environments.