June 2020
tl;dr: Row-wise pixel classification of lane detection from Qualcomm Korea.
Overall impression
This is inspired by GM Isreal’s drivable space prediction method StixelNetV2.
It translate the lane detection problem into a row-wise classification task, which takes advantage of the innate shape of the lane markers (one x value per y) (which is just the opposite for DS prediction, one y value per x, as shown in StixelNetV2).
Pixel segmentation task will require postprocessing algorithm such as pixel wise clustering algorithm.
Key ideas
- The network squeezes the feature map via a sequence of HRM (horizontal reduction module) to reduce the x dim to 1. In HRM, there are two parallel branch, one normal horizontal x-pooling branch, and one spatial to channel and then 1x1 channel reduction branch. Then the two are followed by SE-Block. This is also called horizontal pixel unshuffle layer. This is the reverse pxiel shuffle operation as in Subpixel convolution.
- The prediction of x pixel did not use regression but rather uses a row-wise classification layer. This is a repeated pattern we have seen in neural network that classification works better than regression (including keypoint detection, depth regression, anchor-free object detection, etc). A simple CE works well enough.
Technical details
- In face landmark detection, a KL loss based on laplacian distribution is usually used. (Laplace Landmark Localization ICCV 2019).
- Training with the upgraded AdamW for training, similar to DETR.
Notes
- This seems to be what most industry players are doing for lane detection. See 3D LaneNet.