July 2020
tl;dr: A relatively scale (8k training images) dataset for crowded/dense human detection.
Overall impression
Overall not quite impressive. It fails to cite a closely related dataset CrowdHuman, and ablation study of the issue is not as extensive as well.
Key ideas
- 30 persons per image.
- Annotate top of the head and middle of the feet (similar to CityPerson). The bbox is automatically generated with aspect ratio of 0.41. This is
- Difficulty: > 100 pixel (easy), > 50 pixel (medium), > 20 pixel (hard). Similar to WiderFace.
- NMS is a problem in crowded scenes, but it is not handled in this paper. Maybe try Visibility Guided NMS.
Technical details
- Use pHash to avoid duplication of images.
- Annotation tool with examples in the GUI.
- Evaluation metric: MR
Notes