December 2021
tl;dr: SOTA engineering effort with Lift-Splat-Shoot on BEV detection. Similar to CaDDN and DD3D.
Overall impression
This paper achieves SOTA performance on nuScenes 3D object detection. It uses the SOTA components and did not invent any new modules. The biggest innovation is the proposal of a new data augmentation method in BEV space.
The BEV detection framework has four components
- image-view encoder: SwinTransformer, ResNet
- view transformer: LSS
- BEV encoder: ResNet
- task specific BEV head: CenterPoint.
This work is improved by BEVDet4D and BEVerse.
Key ideas
- Multicam BEVDet has much less data samples and thus suffers severe overfitting issues.
- Image space data agumentation: similar to that in LSS.
- BEV data augmentation (BDA)
- Features in BEV undergoes flipping, scaling and rotating, with corresponding GT undergoing the same augmentaion
- BDA plays a more important role than IDA in training BEVDet.
Technical details
- SOTA image view methods as of late 2021 includes FCOS3D and PGD.
- BEV FOV: output space 51.2 m with resolution of 0.8 meters. –> How about in the front?
- Trained with CBGS, like in CenterPoint.
- BEVDet exceeded classic lidar based method such as pointPillars.
Notes