Discovering biological knowledge by integrating high-throughput data and scientific literature on the cloud
In this paper, we present a bioinformatics knowledge discovery tool for extracting and validating associations between biological entities. By mining specialized scientific literature, the tool not only generates biological hypotheses in the form of associations between genes, proteins, miRNA and diseases but also validates the plausibility of such associations against high-throughput biological data (e.g. microarray) and annotated databases (e.g. Gene Ontology). Both the knowledge discovery system and its validation are carried out by exploiting the advantages and the potentialities of the Cloud, which allowed us to derive and check the validity of thousands of biological associations in a reasonable amount of time. The system was tested on a dataset containing more than 1000 geneâ€“disease associations achieving an average recall of about 71%, outperforming existing approaches. The results also showed that porting a data-intensive application in an Infrastructure as a Service cloud environment boosts significantly the application's efficiency. Copyright Â© 2013 John Wiley & Sons, Ltd.