首页 /研究 /Swin-Depth: Using Transformers and Multi-Scale Fusion for Monocular-Based Depth Estimation

PERCEPTION

Swin-Depth: Using Transformers and Multi-Scale Fusion for Monocular-Based Depth Estimation

Zeyu Cheng, Yi Zhang, Chengkai Tang

发表年份: 2021
引用次数: 55

摘要

Depth estimation using monocular sensors is an important and basic task in computer vision. It has a wide range of applications in robot navigation, autonomous driving, etc., and has received extensive attention from researchers in recent years. For a long time before, monocular depth estimation was based on convolutional neural networks, but its inherent convolution operation showed limitations in modeling large-scale dependence. Using Transformers instead of convolutional neural networks to perform monocular depth estimation provides a good idea, but there is a problem that the calculation complexity is too high and the number of parameters is too large. In response to these problems, we proposed Swin-Depth, which is a Transformer-based monocular depth estimation method that uses hierarchical representation learning with linear complexity for images. In addition, there is an attention module based on multi-scale fusion in Swin-Depth to strengthen the network’s ability to capture global information. Our proposed method effectively reduces the excessive parameters in the monocular depth estimation using transformer, and a large number of research experiments show that Swin-Depth has achieved state-of-the-art in challenging datasets of indoor and outdoor scenes.

关键词

FusionArtificial intelligenceComputer scienceComputer visionTransformerMeasured depthSensor fusionGeologyEngineeringElectrical engineering

Swin-Depth: Using Transformers and Multi-Scale Fusion for Monocular-Based Depth Estimation

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory