August 2020
tl;dr: Use RNN to draw DAG boundaries of lane lines.
Overall impression
There are several works from Uber ATG that extracts polyline representation based on BEV maps.
This is one application of RNN in boundary extraction. Previous work include Polygon-RNN, Polygon-RNN++, Curve GCN also from Uber ATG. The main idea is to create a structured boundary to boost the efficiency for human-in-the-loop annotation.
Polyline Loss focuses on easier lane topology on highways, and DAGMapper focuses on highway driving, and focuses on hard cases like forks and merges. Polymapper only focuses on extracting road network and do not have lane-level information.
The tool is based on RNN, thus autoregressive and does not have a constant runtime for images with varying number of nodes.
The way DAGMapper defines node (control points) and calculate their loss is very insightful. There is no unique way to define control points, and therefore instead of directly regressing L1/L2 distance of prediction and annotated control points, a Chamfer distance loss is used, which calculates the normalized distance between two densely sampled curves. –> This idea actually comes from Polyline Loss.
Key ideas
- Loss: Chamfer distance.
- Evaluated on densely sampled polyline points.
- Adding or removing a control points in a straight line will not change loss.
- Curve matching: Dilate each curve with a radius then compare IoU. This can be seen as a different way to compare two curves as compared to Chamfer distance.
- Given an initial point
- Predict turning angle
- Predict next node location
- Predict status (merge, fork, continue)
- DT (distance transformation) is an efficient feature for mapping
- Thresholded invert DT
- Encodes at each point of the image the relative distance to the closest lane boundary.
- Threshold, binarize and skeletonize DT and use the endpoints as seeds. –> How?
Technical details
- HD maps
- contain information about location of lanes, lane line types, crosswalks, traffic lights, rules at intersection, etc.
- HD map has cm level accuracy.
- Semantic landmarks in HD maps are annotated by hand in an BEV image.
- Resolution: 5 cm / pixel
- Results:
- P/R/F1 = 0.76 @ 2pix = 10 cm threshold. This is evaluated with the densely sampled polyline points.
- P/R/F1 = 0.96 @ 10 pix = 50 cm.
- Topology accuracy = 0.89
Notes
- Many mapping papers before only focus on the coarse level of mapping (no lane-level information), such as PolyMapper, . They focus on road network extraction and semantic labeling, and are not suitable for autonomous driving.
- HD map + DL papers include