May 2020
tl;dr: Improve batch normalization when minibatch size is small.
Similar to GroupNorm in improving performance when batch size is small. It accumulates stats over mini-batches. However, as weights are changing in each iteration, the statistics collected under those weights may become inaccurate under the new weight. A naive average will be wrong. Fortunately, weights change gradually. In Cross-Iteration Batch Normalization (CBM), it estimates those statistics from k previous iterations with the adjustment below.