January 2023
tl;dr: Large scale visual pretraining for policy learning.
Overall impression
The idea is interesting: how to use large scale pretraining to extract driving relevant information. The
Key ideas
- Step1: use large scale uncalibrated driving video to train depthNet and poseNet, a la SfMLearner. The input is two consecutive frames at 1 Hz.
- Step2: from a single image, predict the ego motion. –> This is highly questionable. It would be better to feed in multiple historical frames, and also historical ego motion information. If historical information is important for prediction tasks, why not for planning?
Technical details
- Summary of technical details
Notes
- Questions and notes on how to improve/revise the current work