Point-Voxel and Bird-Eye-View Representation Aggregation Network for Single Stage 3D Object Detection
Kanglin Ning, Yanfei Liu, Yanzhao Su, Ke Jiang
- Year
- 2022
- Citations
- 23
Abstract
3D object detectors based on LiDAR have been extensively used in autonomous and robotic systems. Efficient voxel-based models must downsample their feature space to reduce computation, which leads to the loss of geometric information and limit their accuracy. To solve this problem, this paper presents a 3D detection framework, point-voxel and bird’s-eye-view representation aggregation network for single stage 3D object detection (PVB-SSD), in which a position information input branch generates Fourier embedding features from the origin point cloud to supplement the lost information. A global-former module integrates embedded Fourier features with bird’s-eye-view features extracted by a 3D convolution backbone. Considering that in the deeper layer of the neural network, the spatial level features will be replaced by semantic level features, a windows transformer spatial-semantic aggregate module fuses them dynamically. Extensive experiments on the KITTI, Waymo and NuScences datasets show that our model has excellent accuracy and relatively low computational consumption.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002