CLOUDFLOW - Enabling Faster Biomedical Pipelines with Mapreduce and Spark

Forer, Lukas; Afgan, Enis; Weissenteiner, Hansi; Davidović, Davor; Specht, Guenther; Kronenberg, Florian; Schoenherr, Sebastian

izvor podataka: crosbi ✓

CLOUDFLOW - Enabling Faster Biomedical Pipelines with Mapreduce and Spark (CROSBI ID 227776)

Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija

Forer, Lukas ; Afgan, Enis ; Weissenteiner, Hansi ; Davidović, Davor ; Specht, Guenther ; Kronenberg, Florian ; Schoenherr, Sebastian CLOUDFLOW - Enabling Faster Biomedical Pipelines with Mapreduce and Spark // Scalable Computing. Practice and Experience, 17 (2016), 2; 103-114. doi: 10.12694/scpe.v17i2.1159

Podaci o odgovornosti

Autori

Forer, Lukas ; Afgan, Enis ; Weissenteiner, Hansi ; Davidović, Davor ; Specht, Guenther ; Kronenberg, Florian ; Schoenherr, Sebastian

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

CLOUDFLOW - Enabling Faster Biomedical Pipelines with Mapreduce and Spark

Sažetak

For many years Apache Hadoop has been used as a synonym for processing data in the MapReduce fashion. However, due to the complexity of developing MapReduce applications, adoption of this paradigm in genetics has been limited. To alleviate some of the issues, we have previously developed Cloudflow - a high-level pipeline framework that allows users to create sophisticated biomedical pipelines using predefined code blocks while the framework automatically translates those into the MapReduce execution model. With the introduction of the YARN resource management layer, new computational processing models such as Apache Spark are now plugable into the Hadoop ecosystem. In this paper we describe the extension of Cloudflow to support Apache Spark without any adaptions to already implemented pipelines. The described performance evaluation demonstrates that Spark can bring an additional boost for analysing next generation sequencing (NGS) data to the field of genetics. The Cloudflow framework is open source and freely available at https://github.com/genepi/cloudflow.

Ključne riječi

Apache YARN ; Pipeline Framework ; Spark ; Cloud Computing

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o izdanju

Časopis

Scalable Computing. Practice and Experience

Volumen (broj)

17 (2)

Godina

2016.

Stranice rada

103-114

Status objave rada

objavljeno

e-ISSN

1895-1767

DOI

10.12694/scpe.v17i2.1159

Povezanost rada

Povezane osobe

Davor Davidović (autor/i)

Enis Afgan (autor/i)

Povezane ustanove

Institut Ruđer Bošković (098) (autorova ustanova)

Područje

Računarstvo

Poveznice

doi.org

scpe.org

fulir.irb.hr

Indeksiranost

Scopus

Web of Science Core Collection, Emerging Sources Citation Index (WoSCC-ESCI)