Stanford Dogs Saliency Maps
We built a top-down saliency dataset consisting of eye-gaze data recorded from multiple human subjects while observing dog images taken from the Stanford Dogs dataset, a collection of 20,580 images of dogs from 120 breeds (about 170 images per class). From the whole Stanford Dogs dataset, we used a subset of 9,861 images keeping the original class distribution. The eye-gaze acquisition protocol involved 12 users, who were asked to look at the images shown on a computer screen and attempt to identify dog breeds.
To enforce top-down saliency, images were initially blurred and gradually enhanced until subjects were able to recognize the dog breeds. Thus, each image was shown the time needed by the subjects to identify breeds and eye-gaze gaze were recorded through a 60-Hz Tobii T60 eye-tracker.