Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 1126403

Machine learning in prediction of intrinsic aqueous solubility of drug‐like compounds: Generalization, complexity, or predictive ability?


Lovrić, Mario; Pavlović, Kristina; Žuvela, Petar; Spataru, Adrian; Lučić, Bono; Kern, Roman; Wong, Ming Wah
Machine learning in prediction of intrinsic aqueous solubility of drug‐like compounds: Generalization, complexity, or predictive ability? // Journal of chemometrics, 35 (2021), 7-8; e3349, 16 doi:10.1002/cem.3349 (međunarodna recenzija, članak, znanstveni)


CROSBI ID: 1126403 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Machine learning in prediction of intrinsic aqueous solubility of drug‐like compounds: Generalization, complexity, or predictive ability?

Autori
Lovrić, Mario ; Pavlović, Kristina ; Žuvela, Petar ; Spataru, Adrian ; Lučić, Bono ; Kern, Roman ; Wong, Ming Wah

Izvornik
Journal of chemometrics (0886-9383) 35 (2021), 7-8; E3349, 16

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
consensus modeling ; LASSO ; LightGBM ; PCA ; permutation importance ; QSAR ; randomforests

Sažetak
We present a collection of publicly available intrinsic aqueous solubility data of 829 drug‐like compounds. Four different machine learning algorithms (random forests [RF], LightGBM, partial least squares, and least absolute shrinkage and selection operator [LASSO]) coupled with multistage permutation importance for feature selection and Bayesian hyperparameter optimization were used for the prediction of solubility based on chemical structural information. Our results show that LASSO yielded the best predictive ability on an external test set with a root mean square error (RMSE) (test) of 0.70 log points, an R2(test) of 0.80, and 105 features. Taking into account the number of descriptors as well, an RF model achieves the best balance between complexity and predictive ability with an RMSE(test) of 0.72 log points, an R2(test) of 0.78, and with only 17 features. On a more aggressive test set (principal component analysis [PCA]‐based split), better generalization was observed for the RF model. We propose a ranking score for choosing the best model, as test set performance is only one of the factors in creating an applicable model. The ranking score is a weighted combination of generalization, number of features, and test performance. Out of the two best learners, a consensus model was built exhibiting the best predictive ability and generalization with RMSE(test) of 0.67 log points and a R2(test) of 0.81.

Izvorni jezik
Engleski

Znanstvena područja
Kemija, Interdisciplinarne prirodne znanosti, Računarstvo



POVEZANOST RADA


Projekti:
--KK.01.1.1.01.009 - Napredne metode i tehnologije u znanosti o podatcima i kooperativnim sustavima (DATACROSS) (Šmuc, Tomislav; Lončarić, Sven; Petrović, Ivan; Jokić, Andrej; Palunko, Ivana) ( CroRIS)

Ustanove:
Institut "Ruđer Bošković", Zagreb

Profili:

Avatar Url Bono Lučić (autor)

Avatar Url Mario Lovrić (autor)

Citiraj ovu publikaciju:

Lovrić, Mario; Pavlović, Kristina; Žuvela, Petar; Spataru, Adrian; Lučić, Bono; Kern, Roman; Wong, Ming Wah
Machine learning in prediction of intrinsic aqueous solubility of drug‐like compounds: Generalization, complexity, or predictive ability? // Journal of chemometrics, 35 (2021), 7-8; e3349, 16 doi:10.1002/cem.3349 (međunarodna recenzija, članak, znanstveni)
Lovrić, M., Pavlović, K., Žuvela, P., Spataru, A., Lučić, B., Kern, R. & Wong, M. (2021) Machine learning in prediction of intrinsic aqueous solubility of drug‐like compounds: Generalization, complexity, or predictive ability?. Journal of chemometrics, 35 (7-8), e3349, 16 doi:10.1002/cem.3349.
@article{article, author = {Lovri\'{c}, Mario and Pavlovi\'{c}, Kristina and \v{Z}uvela, Petar and Spataru, Adrian and Lu\v{c}i\'{c}, Bono and Kern, Roman and Wong, Ming Wah}, year = {2021}, pages = {16}, DOI = {10.1002/cem.3349}, chapter = {e3349}, keywords = {consensus modeling, LASSO, LightGBM, PCA, permutation importance, QSAR, randomforests}, journal = {Journal of chemometrics}, doi = {10.1002/cem.3349}, volume = {35}, number = {7-8}, issn = {0886-9383}, title = {Machine learning in prediction of intrinsic aqueous solubility of drug‐like compounds: Generalization, complexity, or predictive ability?}, keyword = {consensus modeling, LASSO, LightGBM, PCA, permutation importance, QSAR, randomforests}, chapternumber = {e3349} }
@article{article, author = {Lovri\'{c}, Mario and Pavlovi\'{c}, Kristina and \v{Z}uvela, Petar and Spataru, Adrian and Lu\v{c}i\'{c}, Bono and Kern, Roman and Wong, Ming Wah}, year = {2021}, pages = {16}, DOI = {10.1002/cem.3349}, chapter = {e3349}, keywords = {consensus modeling, LASSO, LightGBM, PCA, permutation importance, QSAR, randomforests}, journal = {Journal of chemometrics}, doi = {10.1002/cem.3349}, volume = {35}, number = {7-8}, issn = {0886-9383}, title = {Machine learning in prediction of intrinsic aqueous solubility of drug‐like compounds: Generalization, complexity, or predictive ability?}, keyword = {consensus modeling, LASSO, LightGBM, PCA, permutation importance, QSAR, randomforests}, chapternumber = {e3349} }

Časopis indeksira:


  • Current Contents Connect (CCC)
  • Web of Science Core Collection (WoSCC)
    • Science Citation Index Expanded (SCI-EXP)
    • SCI-EXP, SSCI i/ili A&HCI
  • Scopus


Citati:





    Contrast
    Increase Font
    Decrease Font
    Dyslexic Font