PySpark and RDKit: Moving towards Big Data in Cheminformatics

Lovrić, Mario; Molero, José Manuel; Kern, Roman

izvor podataka: crosbi !

PySpark and RDKit: Moving towards Big Data in Cheminformatics (CROSBI ID 296567)

Prilog u časopisu | ostalo | međunarodna recenzija

Lovrić, Mario ; Molero, José Manuel ; Kern, Roman PySpark and RDKit: Moving towards Big Data in Cheminformatics // Molecular Informatics, 38 (2019), 6; 1800082, 5. doi: 10.1002/minf.201800082

Podaci o odgovornosti

Autori

Lovrić, Mario ; Molero, José Manuel ; Kern, Roman

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

PySpark and RDKit: Moving towards Big Data in Cheminformatics

Sažetak

The authors present an implementation of the cheminformatics toolkit RDKit in a distributed computing environment, Apache Hadoop. Together with the Apache Spark analytics engine, wrapped by PySpark, resources from commodity scalable hardware can be employed for cheminformatic calculations and query operations with basic knowledge in Python programming and understanding of the resilient distributed datasets (RDD). Three use cases of cheminfomatical computing in Spark on the Hadoop cluster are presented ; querying substructures, calculating fingerprint similarity and calculating molecular descriptors. The source code for the PySpark-RDKit implementation is provided. The use cases showed that Spark provides a reasonable scalability depending on the use case and can be a suitable choice for datasets too big to be processed with current low-end workstations.

Ključne riječi

QSAR ; Hadoop ; Apache Spark ; Python ; pandas

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o izdanju

Časopis

Molecular Informatics

Volumen (broj)

38 (6)

Godina

2019.

Broj rada

1800082

Broj stranica

Status objave rada

objavljeno

ISSN

1868-1743

DOI

10.1002/minf.201800082

Povezanost rada

Povezane osobe

Mario Lovrić (autor/i)

Povezane ustanove

Dječja bolnica Srebrnjak (277) (autorova ustanova)

Područje

Interdisciplinarne prirodne znanosti, Kemija

Poveznice

doi.org

onlinelibrary.wiley.com

Indeksiranost

Scopus

Current Contents Connect (CCC)

Medline

Web of Science Core Collection, Science Citation Index Expanded (WoSCC-SCI-Exp)

Web of Science Core Collection, SCI-Exp, SSCI & A&HCI (WoSCC-SCI-Exp, SSCI, A&HCI)