Unsupervised Estimation of Monocular Depth and VO in Dynamic Environments via 
Hybrid Masks

Sun, Qiyu; Tang, Yang; Zhang, Chongzhen; Zhao, Chaoqiang; Qian, Feng; Kurths, Jürgen

doi:10.1109/TNNLS.2021.3100895

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Journal Article

Unsupervised Estimation of Monocular Depth and VO in Dynamic Environments via Hybrid Masks

Authors

Sun, Qiyu
External Organizations;

Tang, Yang
External Organizations;

Zhang, Chongzhen
External Organizations;

Zhao, Chaoqiang
External Organizations;

Qian, Feng
External Organizations;

/persons/resource/Juergen.Kurths

Kurths, Jürgen
Potsdam Institute for Climate Impact Research;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Sun, Q., Tang, Y., Zhang, C., Zhao, C., Qian, F., Kurths, J. (2022): Unsupervised Estimation of Monocular Depth and VO in Dynamic Environments via Hybrid Masks. - IEEE Transactions on Neural Networks and Learning Systems, 33, 5, 2023-2033.
https://doi.org/10.1109/TNNLS.2021.3100895

Cite as: https://publications.pik-potsdam.de/pubman/item/item_26573

Abstract

Deep learning-based methods have achieved remarkable performance in 3-D sensing since they perceive environments in a biologically inspired manner. Nevertheless, the existing approaches trained by monocular sequences are still prone to fail in dynamic environments. In this work, we mitigate the negative influence of dynamic environments on the joint estimation of depth and visual odometry (VO) through hybrid masks. Since both the VO estimation and view reconstruction process in the joint estimation framework is vulnerable to dynamic environments, we propose the cover mask and the filter mask to alleviate the adverse effects, respectively. As the depth and VO estimation are tightly coupled during training, the improved VO estimation promotes depth estimation as well. Besides, a depth-pose consistency loss is proposed to overcome the scale inconsistency between different training samples of monocular sequences. Experimental results show that both our depth prediction and globally consistent VO estimation are state of the art when evaluated on the KITTI benchmark. We evaluate our depth prediction model on the Make3D dataset to prove the transferability of our method as well.