Improving 360 Monocular Depth Estimation via Non-local Dense Prediction Transformer and Joint Supervised and Self-Supervised Learning

Ilwi Yun; Hyuk-Jae Lee; Chae Eun Rhee

Back to AAAI

AAAI 2022

Improving 360 Monocular Depth Estimation via Non-local Dense Prediction Transformer and Joint Supervised and Self-Supervised Learning

Conference Paper AAAI Technical Track on Computer Vision III Artificial Intelligence

PDF Details

Abstract

Due to difficulties in acquiring ground truth depth of equirectangular (360◦ ) images, the quality and quantity of equirectangular depth data today is insufficient to represent the various scenes in the world. Therefore, 360◦ depth estimation studies, which relied solely on supervised learning, are destined to produce unsatisfactory results. Although self-supervised learning methods focusing on equirectangular images (EIs) are introduced, they often have incorrect or non-unique solutions, causing unstable performance. In this paper, we propose 360◦ monocular depth estimation methods which improve on the areas that limited previous studies. First, we introduce a self-supervised 360◦ depth learning method that only utilizes gravity-aligned videos, which has the potential to eliminate the needs for depth data during the training procedure. Second, we propose a joint learning scheme realized by combining supervised and self-supervised learning. The weakness of each learning is compensated, thus leading to more accurate depth estimation. Third, we propose a nonlocal fusion block, which can further retain the global information encoded by vision transformer when reconstructing the depths. With the proposed methods, we successfully apply the transformer to 360◦ depth estimations, to the best of our knowledge, which has not been tried before. On several benchmarks, our approach achieves significant improvements over previous works and establishes a state of the art.

Improving 360 Monocular Depth Estimation via Non-local Dense Prediction Transformer and Joint Supervised and Self-Supervised Learning

Abstract

Authors

Keywords

Context