November 2020
tl;dr: Improve point embedding with dynamic voxelization and multiview fusion.
Overall impression
This paper is from the 1st author of VoxelNet.
Both VoxelNet and PointPillars uses PointNet to learn point embeddings, and generate pseudo-3D volume or pseudo-2D image to use 3D and 2D convolution. This paper improves the point embedding process by aggregating multiple views, and is a plug-and-play module that can be integrated into pointpillars.
Note that both PointPillars and the successor MVF are both still using anchors for prediction. The entire procedure is not well described. See Pillar OD for a better description.
Key ideas
- Multiview architecture
- Voxelize points in BEV or spherical view or cylindrical view to pillars.
- Extract pillar features.
- Project pillar features to points with nearest neighbor or bilinear interpolation and concat to point features.
- Transform point features to BEV
- Detection backbone + head
- Dynamic voxelization: instead of having a fixed array (K max voxel x T max point per voxel x F feat dim), it maintains a dictionary of the point-voxel relationship.
- The points in each dynamic voxel (having varying num of points) are aggregated into a fixed length feature by maxPooling, similar to PointNet.
- This avoid info loss and saves memory
- Multiview fusion: voxelization in Cartesian (X x Y x Z) with Z as the channel dim, and in speherical view (elevation x azimuth x radius) with radius as the channel dim.
- The features are extracted with a convolution tower that maintains the tensor shape
- The features from Cartesian path, spherical path and original encoded features are concatenated. –> this enhances the original encoded features.
- The enhanced point features can be used in structures like VoxelNet and PointPillars.
Technical details
- Dynamic voxelization can bring some KPI improvement (avoid info loss), but multiview fusion brings much more.
- The lookup table is not readily available and need customized implementation of CUDA kernel to reach good speed.
- This may not have constant runtime as it is based on the number of points in the frame.
Notes
- Review on 知乎
- Why voxelization in spherical view helps?
- 一般的voxel问题是在远处稀疏切割时,因为点已经很稀疏了,还是拿原size进行切割,常常会有过度分割的问题,导致voxel feature提取的也不是很好,如果能透过Frustum在远处进行较大的voxel切割,则能有效的提升远处voxel feature的问题,也因此本文透过融合两种角度voxel进行实验, 也证明了确实有效。