September 2019
tl;dr: First paper that demonstrate scale consistency in long video and can achieve better performance than stereo.
The next step paper is DF-VO which predicts dense optical flow and uses 2D-2D matching to regress ego-motion, achieving even more accurate VO.
The introduction of depth scale consistency is the key to the good performance on relative pose estimation, and thus enables the VO use.
The performance of sfm-learner is actually not that good on VO. Scale and rotation drift is large. See scale consistent sfm-learner for better VO performance.