Unsupervised Estimation of Monocular Depth and VO in Dynamic Environments via 
Hybrid Masks

Sun, Qiyu; Tang, Yang; Zhang, Chongzhen; Zhao, Chaoqiang; Qian, Feng; Kurths, Jürgen

doi:10.1109/TNNLS.2021.3100895

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Zeitschriftenartikel

Unsupervised Estimation of Monocular Depth and VO in Dynamic Environments via Hybrid Masks

Urheber*innen

Sun, Qiyu
External Organizations;

Tang, Yang
External Organizations;

Zhang, Chongzhen
External Organizations;

Zhao, Chaoqiang
External Organizations;

Qian, Feng
External Organizations;

/persons/resource/Juergen.Kurths

Kurths, Jürgen
Potsdam Institute for Climate Impact Research;

Externe Ressourcen

Es sind keine externen Ressourcen hinterlegt

Volltexte (frei zugänglich)

Es sind keine frei zugänglichen Volltexte in PIKpublic verfügbar

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Sun, Q., Tang, Y., Zhang, C., Zhao, C., Qian, F., Kurths, J. (2022): Unsupervised Estimation of Monocular Depth and VO in Dynamic Environments via Hybrid Masks. - IEEE Transactions on Neural Networks and Learning Systems, 33, 5, 2023-2033.
https://doi.org/10.1109/TNNLS.2021.3100895

Zitierlink: https://publications.pik-potsdam.de/pubman/item/item_26573

Zusammenfassung

Deep learning-based methods have achieved remarkable performance in 3-D sensing since they perceive environments in a biologically inspired manner. Nevertheless, the existing approaches trained by monocular sequences are still prone to fail in dynamic environments. In this work, we mitigate the negative influence of dynamic environments on the joint estimation of depth and visual odometry (VO) through hybrid masks. Since both the VO estimation and view reconstruction process in the joint estimation framework is vulnerable to dynamic environments, we propose the cover mask and the filter mask to alleviate the adverse effects, respectively. As the depth and VO estimation are tightly coupled during training, the improved VO estimation promotes depth estimation as well. Besides, a depth-pose consistency loss is proposed to overcome the scale inconsistency between different training samples of monocular sequences. Experimental results show that both our depth prediction and globally consistent VO estimation are state of the art when evaluated on the KITTI benchmark. We evaluate our depth prediction model on the Make3D dataset to prove the transferability of our method as well.