Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 1139075

Should We Embed in Chemistry? A Comparison of Unsupervised Transfer Learning with PCA, UMAP, and VAE on Molecular Fingerprints


Lovrić, Mario; Đuričić, Tomislav; Tran, Han T. N.; Hussain, Hussain; Lacić, Emanuel; Rasmussen, Morten A.; Kern, Roman
Should We Embed in Chemistry? A Comparison of Unsupervised Transfer Learning with PCA, UMAP, and VAE on Molecular Fingerprints // Pharmaceuticals, 14 (2021), 8; 758, 17 doi:10.3390/ph14080758 (međunarodna recenzija, članak, ostalo)


CROSBI ID: 1139075 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Should We Embed in Chemistry? A Comparison of Unsupervised Transfer Learning with PCA, UMAP, and VAE on Molecular Fingerprints

Autori
Lovrić, Mario ; Đuričić, Tomislav ; Tran, Han T. N. ; Hussain, Hussain ; Lacić, Emanuel ; Rasmussen, Morten A. ; Kern, Roman

Izvornik
Pharmaceuticals (1424-8247) 14 (2021), 8; 758, 17

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, ostalo

Ključne riječi
manifold learning ; machine learning ; rdkit ; embeddings ; Tox21 ; principal component analysis ; autoencoder

Sažetak
Methods for dimensionality reduction are showing significant contributions to knowledge generation in high-dimensional modeling scenarios throughout many disciplines. By achieving a lower dimensional representation (also called embedding), fewer computing resources are needed in downstream machine learning tasks, thus leading to a faster training time, lower complexity, and statistical flexibility. In this work, we investigate the utility of three prominent unsupervised embedding techniques (principal component analysis—PCA, uniform manifold approximation and projection—UMAP, and variational autoencoders—VAEs) for solving classification tasks in the domain of toxicology. To this end, we compare these embedding techniques against a set of molecular fingerprint-based models that do not utilize additional pre-preprocessing of features. Inspired by the success of transfer learning in several fields, we further study the performance of embedders when trained on an external dataset of chemical compounds. To gain a better understanding of their characteristics, we evaluate the embedders with different embedding dimensionalities, and with different sizes of the external dataset. Our findings show that the recently popularized UMAP approach can be utilized alongside known techniques such as PCA and VAE as a pre-compression technique in the toxicology domain. Nevertheless, the generative model of VAE shows an advantage in pre-compressing the data with respect to classification accuracy.

Izvorni jezik
Engleski

Znanstvena područja
Kemija, Interdisciplinarne prirodne znanosti, Informacijske i komunikacijske znanosti



POVEZANOST RADA


Ustanove:
Institut za antropologiju

Profili:

Avatar Url Mario Lovrić (autor)

Poveznice na cjeloviti tekst rada:

Pristup cjelovitom tekstu rada doi www.mdpi.com

Citiraj ovu publikaciju:

Lovrić, Mario; Đuričić, Tomislav; Tran, Han T. N.; Hussain, Hussain; Lacić, Emanuel; Rasmussen, Morten A.; Kern, Roman
Should We Embed in Chemistry? A Comparison of Unsupervised Transfer Learning with PCA, UMAP, and VAE on Molecular Fingerprints // Pharmaceuticals, 14 (2021), 8; 758, 17 doi:10.3390/ph14080758 (međunarodna recenzija, članak, ostalo)
Lovrić, M., Đuričić, T., Tran, H., Hussain, H., Lacić, E., Rasmussen, M. & Kern, R. (2021) Should We Embed in Chemistry? A Comparison of Unsupervised Transfer Learning with PCA, UMAP, and VAE on Molecular Fingerprints. Pharmaceuticals, 14 (8), 758, 17 doi:10.3390/ph14080758.
@article{article, author = {Lovri\'{c}, Mario and \DJuri\v{c}i\'{c}, Tomislav and Tran, Han T. N. and Hussain, Hussain and Laci\'{c}, Emanuel and Rasmussen, Morten A. and Kern, Roman}, year = {2021}, pages = {17}, DOI = {10.3390/ph14080758}, chapter = {758}, keywords = {manifold learning, machine learning, rdkit, embeddings, Tox21, principal component analysis, autoencoder}, journal = {Pharmaceuticals}, doi = {10.3390/ph14080758}, volume = {14}, number = {8}, issn = {1424-8247}, title = {Should We Embed in Chemistry? A Comparison of Unsupervised Transfer Learning with PCA, UMAP, and VAE on Molecular Fingerprints}, keyword = {manifold learning, machine learning, rdkit, embeddings, Tox21, principal component analysis, autoencoder}, chapternumber = {758} }
@article{article, author = {Lovri\'{c}, Mario and \DJuri\v{c}i\'{c}, Tomislav and Tran, Han T. N. and Hussain, Hussain and Laci\'{c}, Emanuel and Rasmussen, Morten A. and Kern, Roman}, year = {2021}, pages = {17}, DOI = {10.3390/ph14080758}, chapter = {758}, keywords = {manifold learning, machine learning, rdkit, embeddings, Tox21, principal component analysis, autoencoder}, journal = {Pharmaceuticals}, doi = {10.3390/ph14080758}, volume = {14}, number = {8}, issn = {1424-8247}, title = {Should We Embed in Chemistry? A Comparison of Unsupervised Transfer Learning with PCA, UMAP, and VAE on Molecular Fingerprints}, keyword = {manifold learning, machine learning, rdkit, embeddings, Tox21, principal component analysis, autoencoder}, chapternumber = {758} }

Časopis indeksira:


  • Current Contents Connect (CCC)
  • Web of Science Core Collection (WoSCC)
    • Science Citation Index Expanded (SCI-EXP)
    • SCI-EXP, SSCI i/ili A&HCI
  • Scopus


Citati:





    Contrast
    Increase Font
    Decrease Font
    Dyslexic Font