Mar 2019
tl;dr: Aggregate CNN features from multiple 2D projections of a 3D object to obtain a high quality 3D feature.
The idea of using pre-trained and fine-tuned CNN to extract 2D features has been widely used. This paper explores ways to effectively aggregate these 2D features (concatenation, average or max pooling) into a high quality feature for the 3D object. Given that this feature should be insensitive to the number of 2D projections and the permutation of the list of 2D features (orderless list), max pooling of 2D features across views seems a very natural choice. In addition, the learning of a low-rank Mahalanobis metric significantly boosts the retrieval performance.
Thus learning Mahalanobis metric corresponds to learning a linear data transformation! If some eigenvalues (diagonals of D) are zero, then the metric effectively performs dimensionality reduction
- It is quite similar to NCA (neighborhood component analysis) and different from PCA (unsupervised, find a direction on which the variation of the whole data set is the largest) and LCA (linear discriminant analysis, supervised, find a direction where the data from the same classes are clustered while from different classes are separated).