Large Scale Data Processing in Ecology: A Case Study on Long-Term Underwater Video Monitoring
Ecology is, nowadays, an interdisciplinary, collabo- rative and data-intensive science, therefore, discovering, integrat- ing and analysing daily-produced data is necessary to support researchers to investigate complex questions, ranging from single particles to animals to the biosphere . As a consequence, ecology-related multimedia content has been produced massively in recent years: for example, the Xeno-canto project1 and the Pl@ntNet project2 respectively collected 140,000 audio records of 8,700 bird species and about 60,000 thousand images covering thousand of plant species, to be used by scientists or professionals. Unfortunately, a manual analysis of such amount of generated data is impossible: automatic analysis tools combined with high- performance computing (HPC) solutions are therefore heavily demanded for making sense of such big ecological data. In this paper we present a case study of large-scale video processing on HPC facilities for underwater fish monitoring in the context of the Fish4Knowledge project 3, where a system to analyse long-term underwater camera footage has been developed. The paper is meant to report on the employed hardware/software architecture, the design and deployment of the parallel job manager, and the problems encountered during the whole process, from load balancing to job submission policies to bottlenecks.