September 2020
tl;dr: Calculate effective numbers for each class for better weighted loss.
This paper reminds me of effective receptive field paper from Uber ATG, which basically says the effective RF grows with sqrt(N) with deeper nets.
This paper has some basic assumptions and derived a general equation to come up with the effective number for weight. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1−\beta^N)/(1-\beta)$, where N is the number of samples and $\beta \in [0, 1)$ is a hyperparameter.
People seem to have noticed it and uses some simple heuristics to counter the effect. For example, this paper noticed using 1/N would bias the loss toward minority class and thus simply uses 1/sqrt(N) as the weighting factor, in PyrOccNet.