April 2020
tl;dr: Learn a cost volume for planning and use diverse sampling to find the best trajectory.
Prediction (motion forecasting) tackles the problem of estimating the future positions of all actors as well as their intentions (changing lanes, parking). Finally, motion planning takes the output from previous stacks and generates a safe trajectory fort he SDV to execute via a control system.
Ego-Motion forecasting provides strong cue on how the SDV will move in future. However it may not use the information of the dynamic environment, i.e., how other cars will move.
The author argues that traditional metric (such as mAP) for perception may not be optimal. Such metrics weigh all actors uniformly, whereas nearby actors impact downstream modules more. However, large companies in industry still favors decoupled stack where large engineering teams work in parallel with specific task-specific objectives in mind. Advances in upstream stack may not necessarily translates to overall system improvement. (I guess that is part of the responsibility of engineering management, to identify bottlenecks in the entire engineering stack.) Therefore the authors (Uber ATG) advocates for end-to-end systems.
The authors still uses a perception loss, but this loss is only used to guide the system and provide interpretability to the end-to-end stack.
This work is based on Fast and Furious and IntentNet. It uses similar input of lidar data and semantic maps. It inspires MP3.