A diversity-based search approach to support annotation of a large fish image dataset
Label propagation consists in annotating an unlabeled dataset starting from a set of labeled items. However, most current methods exploit only image similarity between labeled and unlabeled images in order to find propagation candidates, which may result, especially in very large datasets, in retrieving mostly near-duplicate images. While such approaches are technically correct, as they maximize the propagation precision, the resulting annotated dataset may not be as useful, since they lack intra-class variability within the set of images sharing the same label. In this paper, we propose an approach for label propagation which favors the propagation of an object’s label to a set of images representing as many different views of that object as possible, while at the same time preserving the relevance of the retrieved items to the query. Our method is based on a diversity-based clustering technique using a random forest framework and a label propagation approach which is able to effectively and efficiently propagate annotations using a similarity-based approach operating on clusters. The method was tested on a very large dataset of fish images achieving good performance in automated label propagation, ensuring diversification of the annotated items while preserving precision.