首页 /研究 /Self supervised Visual Geometry Learning
PERCEPTION

Self supervised Visual Geometry Learning

Yiran Zhong

发表年份
2020
引用次数
2

摘要

Visual geometry learning aims to recover 3D geometry information i.e., surface normal, depth maps and camera poses from images. As a classic task in computer vision, this problem has been studied extensively for decades. It contains depth completion, stereo matching, monocular depth estimation, optical flow, visual odometry, structure from motion and etc. This thesis is dedicated to solving these problems from both conventional learning and deep learning perspectives. Like most data-driven methods, supervised deep learning-based methods require a large amount of labeled training data and suffer limited generalization ability. Selfsupervised learning is a technique that allows a network to learn feature representations without labeled data. In this thesis, we investigate the problem of applying self-supervised learning techniques to visual geometry learning and push the limit of the state of the art in terms of accuracy, speed, and generalization ability in visual geometry recovery tasks. In the depth completion task, two conventional optimization-based methods are proposed. The first one assumes a dense depth map can be approximated by a weighted sum of a set of principal components and enforces this assumption as a global geometric constraint. A colour-guided auto-regression model is applied to make the estimated depth map have sharp object boundaries. The proposed method can be efficiently solved in a closed form and outperforms previous methods. The other method further enforces a piecewise planar model to depth completion task and formulates it as a continuous Conditional Random Field (CRF) optimization problem. Experiments show that the proposed method is faster and more accurate than previous methods. In the stereo matching task, we propose to solve this problem through a deep self-supervised framework. Conventional optimization-based methods often require several seconds to minutes to process a sample, which makes them infeasible for time-critical applications such as autonomous driving and robotics. Moreover, supervised deep methods often require a large number of ground truth labels for training and suffer limited generalization capability. By leveraging self-supervised learning, our self-supervised stereo matching networks will not need any labeled data and can adapt themselves to new scenarios on-the-fly. The key idea is to make several assumptions of scenes and formulate them into loss functions, then optimize them through backpropagation. The loss functions are similar to the energy functions in conventional optimization-based methods but we are allowed to use more complex loss functions to describe a scene more precisely. Experiments demonstrate that the proposed methods have better performance in terms of both speed and accuracy. A similar strategy is also applied to the LiDAR-Stereo fusion task. A “feedback loop” is proposed to deal with the noise in LiDAR measurements. We also extend stereo matching to stereo video matching problem by utilizing convolutional LSTM modules to handle temporal consistency in videos. To deal with time-critical applications, we present a super-efficient stereo matching network structure that can process HD images at 100 FPS. We also leverage AutoML techniques i.e., neural architecture search (NAS), to find an optimal architecture for deep stereo matching and achieve top 1 accuracy among various benchmarks with far less trainable parameters. We further define a new problem called single mixture image depth estimation. Here, the single image can be a mixture of a stereo pair in a form of I = αI le f t + (1 − α)I right. Depending on the choice of α, this task can be seen as RedCyan depth, Double vision depth, and monocular depth estimation. Instead of brute force regressing depth from a single image, we divide the task into two sub-tasks: image separation and stereo matching. We first decouple the mixed image through an image separation module and then do stereo matching on the separat

关键词

Artificial intelligenceComputer scienceGeometryMathematics

相关论文

查看 PERCEPTION 分类全部论文