Feb 2019
tl;dr: Introduce a new dimension Cardinality besides depth (#layers) and width (#channels) of CNN. Spliting the computation from width to cardinality leads to better performance.
The paper exploits the multi-path (split-transform-merge) strategy, simplifies the design rules and introduces a new dimension. The better performnace than ResNet is a strong tesimony. This should be compared to other papers like Xception and mobileNet to see how they fare with each other.
Cardinality is the size of the set of transformation. Increasing cardinality is more effective than going deeper or wider when increasing the capacity (FLOPs, #params).
The aggregated transformation can be expressed as: \(y = x + \sum_{i=1}^C T_i(x)\)
Split-transform-merge strategy of inception module
The solution space of inception module is a strict subspace of the solution space of a single larger layer (e.g, 5x5) operating on a higher-simentinal embedding. The split-transform-merge behavior is expected to approach the representational power of larger and dense layers, but at a considerbaly lower computiaonl complexity (and easier to train).