Depth Estimation with Simplified Transformer

John Yang, Le An, Anurag Dixit, Jinkyu Koo, Su Inn Park

发表年份: 2022
引用次数: 11
访问权限: 开放获取

摘要

Transformer and its variants have shown state-of-the-art results in many vision tasks recently, ranging from image classification to dense prediction. Despite of their success, limited work has been reported on improving the model efficiency for deployment in latency-critical applications, such as autonomous driving and robotic navigation. In this paper, we aim at improving upon the existing transformers in vision, and propose a method for self-supervised monocular Depth Estimation with Simplified Transformer (DEST), which is efficient and particularly suitable for deployment on GPU-based platforms. Through strategic design choices, our model leads to significant reduction in model size, complexity, as well as inference latency, while achieving superior accuracy as compared to state-of-the-art. We also show that our design generalize well to other dense prediction task without bells and whistles.

关键词

Software deploymentTransformerComputer scienceInferenceLatency (audio)RangingArtificial intelligenceMonocularMachine learningComputer engineering

Depth Estimation with Simplified Transformer

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory