January 2020
tl;dr: Summary of the main idea.
Overall impression
Adds a social pooling layer that pools the hidden stages of the neighbors within a spatial radius.
Key ideas
- Instead of a spatial occupancy grid, replace the occupancy with LSTM embedding.
Technical details
- Social LSTM is actually done from a surveillance view point (between perspective onboard cameras and BEV).
Notes
- talk at CVPR: the animation of predicting a person passing through the gap of a crowd is cool.