Home /Research /Up-to-Down Network: Fusing Multi-Scale Context for 3D Semantic Scene Completion
PERCEPTION

Up-to-Down Network: Fusing Multi-Scale Context for 3D Semantic Scene Completion

Hao Zou, Xuemeng Yang, Tianxin Huang, Chujuan Zhang, Yong Liu, Wanlong Li, Feng Wen, Hongbo Zhang

Year
2021
Citations
20

Abstract

An efficient 3D scene perception algorithm is a vital component for autonomous driving and robotics systems. In this paper, we focus on semantic scene completion, which is a task of jointly estimating the volumetric occupancy and semantic labels of objects. Since the real-world data is sparse and occluded, this is an extremely challenging task. We propose a novel framework, named Up-to-Down network (UDNet), to achieve the large-scale semantic scene completion with an encoder-decoder architecture for voxel grids. The novel up-to-down block can effectively aggregate multi-scale context information to improve labeling coherence, and the atrous spatial pyramid pooling module is leveraged to expand the receptive field while preserving detailed geometric information. Besides, the proposed multi-scale fusion mechanism efficiently aggregates global background information and improves the semantic completion accuracy. Moreover, to further satisfy the needs of different tasks, our UDNet can accomplish the multi-resolution semantic completion, achieving faster but coarser completion. Detailed experiments in the semantic scene completion benchmark of SemanticKITTI illustrate that our proposed framework surpasses the state-of-the-art methods with remarkable margins and a real-time inference speed by using only voxel grids as input.

Keywords

Computer scienceArtificial intelligenceEncoderSemantics (computer science)Context (archaeology)Pyramid (geometry)Benchmark (surveying)Computer visionTask (project management)Inference

Related papers

Browse all PERCEPTION papers