October 2020
tl;dr: Improved DETR that trains faster and performs better to small objects.
Issues with DETR: long training epochs to converge and low performance at detecting small objects. DETR uses small-size feature maps to save computation, but hurt small objects.
Deformable DETR first reduces computation by attending to only a small set of key sampling points around a reference. It then uses multi-scale deformable attention module to aggregate multi-scale features (without FPN) to help small object detection.
Each object query is restricted to attend to a small set of key sampling points around the reference points instead of all points in the feature map.
Deformable DETR is one of the highest scored papers in ICLR 2021.
There are several papers on improving the training speed of DETR.