October 2020
tl;dr: Generalized focal loss that can optimize for any continuous number and distribution. Uses a joint cls-IoU representation to predict the localization quality and the distribution of boxes locations.
The paper follows the method of focal loss (modulating Cross Entropy by L2 loss). Actually cross entropy can be easily extended to regressing any number between 0 and 1, but it just have a very flat bottom. Now generalized focal loss modulates this extended cross entropy by L2 loss.
A recent trend in one-stage detector is to introduce an individual prediction branch to estimate the quality of localization. The center-ness (FOCS and ATSS) or IoU score branch (IoUNet) can be trained separately and used in NMS process. But the quality predictor is inconsistent between training and test. Concretely, the negative bbox does not have IoU supervision and can have extremely high IoU predictions and thus degrades the NMS process.
Bbox boundaries are generally formulated as a Dirac delta function (deterministic) or Gaussian (Gaussian yolo and KL Loss). This paper targets to formulate the boundary as an arbitrarily shaped distribution. This formulation itself reaches the same performance as baseline, but with DFL (distributional focal loss), it is better. –> for loss on a distribution, cf Unsuperpoint.
The encoding of a regression target is similar to one-hot encoding in depth regression network, such as single-modal weighted average (SMWA) and Deoth Coefficient.
The dispersion of the distribution can be used as localization confidence as well. This in a way achieves what uncertainty learning (Gaussian yolo and KL Loss) tries to achieve, but without the uncertainty bit which can be hard to train in practice. –> this is actually exactly what the improved version GFocalV2 does.