Generating reliable video annotations by exploiting the crowd
In computer vision and machine learning, the availability of annotated datasets is of crucial importance for both learning and performance evaluation. However, annotating visual datasets is a tedious and error-prone task and computer vision researchers usually dedicate a large amount of their time for collecting and generating annotations, which most of the time cannot be re-used in other scenarios. In this paper, we propose a simple, but effective, interactive video object segmentation method exploiting large noisy data gathered from crowd of users while playing a web game. Experimental results, carried out on two challenging video benchmarks, show how it is possible to generate reliable object segmentations in videos with a small human effort, achieving an accuracy comparable to the one obtained with manually-labeled annotations and also outperforming state-of-the-art video object segmentation approaches.