Home /Research /Point-Voxel and Bird-Eye-View Representation Aggregation Network for Single Stage 3D Object Detection
PERCEPTION

Point-Voxel and Bird-Eye-View Representation Aggregation Network for Single Stage 3D Object Detection

Kanglin Ning, Yanfei Liu, Yanzhao Su, Ke Jiang

Year
2022
Citations
23

Abstract

3D object detectors based on LiDAR have been extensively used in autonomous and robotic systems. Efficient voxel-based models must downsample their feature space to reduce computation, which leads to the loss of geometric information and limit their accuracy. To solve this problem, this paper presents a 3D detection framework, point-voxel and bird’s-eye-view representation aggregation network for single stage 3D object detection (PVB-SSD), in which a position information input branch generates Fourier embedding features from the origin point cloud to supplement the lost information. A global-former module integrates embedded Fourier features with bird’s-eye-view features extracted by a 3D convolution backbone. Considering that in the deeper layer of the neural network, the spatial level features will be replaced by semantic level features, a windows transformer spatial-semantic aggregate module fuses them dynamically. Extensive experiments on the KITTI, Waymo and NuScences datasets show that our model has excellent accuracy and relatively low computational consumption.

Keywords

Computer scienceArtificial intelligencePoint cloudComputer visionVoxelObject detectionComputationRepresentation (politics)Feature extractionPattern recognition (psychology)

Related papers

Browse all PERCEPTION papers