Learning-AI

What You See is What You Get: Exploiting Visibility for 3D Object Detection

June 2020

tl;dr: Visibility augmented deep voxel representation, with occupancy grid feature map.

Overall impression

Representing point cloud as xyz fundamentally destroys the difference of free space and uncertainty space (both contains no lidar points). Convolution cannot differentiate such difference based on such a representation.

WYSIWYG adds an occupancy grid feature map to the existing feature map, very much like coord conv and cam conv.

Key ideas

Online occupancy mapping
- Following OctMap. Fast ray casting via voxel traversal –> visibility volume
- Integration over time with Bayesian Filtering,
- Uses the same discretization as BEV map. Voxel size 0.25x0.25x0.25 m^3.
- FoV [-50, 50] x [-50, 50] x [-5, 3] –> 400x400x32 occupancy grid.
Visibility over multiple ldiar sweeps: bayesian filtering
data augmentation
- Naive data augmentation: copy and paste rare objects such as buses. But it violates visibility rules.
- Visibility aware data augmentation
- Drilling is better than culling as culling removes too many invalid cases, especially for big cars (hard to place them in a place where it does not violate visibility)
The idea of ray casting is widely used in generating lidar simulated data, such as lidar sim.

Technical details

Reconstructed point cloud and measured point cloud (lidar sweeps) are different in that measured point cloud also includes the visibility information.
- Lidar sweeps are 2.5D
- True 3D point cloud data are, for example, sampled from mesh models
Two crucial innovation in training lidar object detector recently: Object augmentation and temporal aggregation. They are first proposed in SECOND, and then used in all SOTA methods such as pointPillars. The temporal aggregated lidar frames should be motion compensated.
Training follows One cycle policy. This is also used in centerpoint and CBGS.

Notes

Review about data representation
Talk at CVPR 2020
github code. The codebase is based on SECOND, similar to pointPillars.
Occupancy grid maps, lecture by Cyrill Stachniss