Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 1136005

PySpark and RDKit: Moving towards Big Data in Cheminformatics


Lovrić, Mario; Molero, José Manuel; Kern, Roman
PySpark and RDKit: Moving towards Big Data in Cheminformatics // Molecular Informatics, 38 (2019), 6; 1800082, 5 doi:10.1002/minf.201800082 (međunarodna recenzija, ostalo)


CROSBI ID: 1136005 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
PySpark and RDKit: Moving towards Big Data in Cheminformatics

Autori
Lovrić, Mario ; Molero, José Manuel ; Kern, Roman

Izvornik
Molecular Informatics (1868-1743) 38 (2019), 6; 1800082, 5

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, ostalo, ostalo

Ključne riječi
QSAR ; Hadoop ; Apache Spark ; Python ; pandas

Sažetak
The authors present an implementation of the cheminformatics toolkit RDKit in a distributed computing environment, Apache Hadoop. Together with the Apache Spark analytics engine, wrapped by PySpark, resources from commodity scalable hardware can be employed for cheminformatic calculations and query operations with basic knowledge in Python programming and understanding of the resilient distributed datasets (RDD). Three use cases of cheminfomatical computing in Spark on the Hadoop cluster are presented ; querying substructures, calculating fingerprint similarity and calculating molecular descriptors. The source code for the PySpark-RDKit implementation is provided. The use cases showed that Spark provides a reasonable scalability depending on the use case and can be a suitable choice for datasets too big to be processed with current low-end workstations.

Izvorni jezik
Engleski

Znanstvena područja
Kemija, Interdisciplinarne prirodne znanosti



POVEZANOST RADA


Ustanove:
Dječja bolnica Srebrnjak

Profili:

Avatar Url Mario Lovrić (autor)

Poveznice na cjeloviti tekst rada:

doi onlinelibrary.wiley.com

Citiraj ovu publikaciju:

Lovrić, Mario; Molero, José Manuel; Kern, Roman
PySpark and RDKit: Moving towards Big Data in Cheminformatics // Molecular Informatics, 38 (2019), 6; 1800082, 5 doi:10.1002/minf.201800082 (međunarodna recenzija, ostalo)
Lovrić, M., Molero, J. & Kern, R. (2019) PySpark and RDKit: Moving towards Big Data in Cheminformatics. Molecular Informatics, 38 (6), 1800082, 5 doi:10.1002/minf.201800082.
@article{article, author = {Lovri\'{c}, Mario and Molero, Jos\'{e} Manuel and Kern, Roman}, year = {2019}, pages = {5}, DOI = {10.1002/minf.201800082}, chapter = {1800082}, keywords = {QSAR, Hadoop, Apache Spark, Python, pandas}, journal = {Molecular Informatics}, doi = {10.1002/minf.201800082}, volume = {38}, number = {6}, issn = {1868-1743}, title = {PySpark and RDKit: Moving towards Big Data in Cheminformatics}, keyword = {QSAR, Hadoop, Apache Spark, Python, pandas}, chapternumber = {1800082} }
@article{article, author = {Lovri\'{c}, Mario and Molero, Jos\'{e} Manuel and Kern, Roman}, year = {2019}, pages = {5}, DOI = {10.1002/minf.201800082}, chapter = {1800082}, keywords = {QSAR, Hadoop, Apache Spark, Python, pandas}, journal = {Molecular Informatics}, doi = {10.1002/minf.201800082}, volume = {38}, number = {6}, issn = {1868-1743}, title = {PySpark and RDKit: Moving towards Big Data in Cheminformatics}, keyword = {QSAR, Hadoop, Apache Spark, Python, pandas}, chapternumber = {1800082} }

Časopis indeksira:


  • Current Contents Connect (CCC)
  • Web of Science Core Collection (WoSCC)
    • Science Citation Index Expanded (SCI-EXP)
    • SCI-EXP, SSCI i/ili A&HCI
  • Scopus
  • MEDLINE


Citati:





    Contrast
    Increase Font
    Decrease Font
    Dyslexic Font