Pregled bibliografske jedinice broj: 404027
Improvement of Ensemble of Multi-Regression Structure-Toxicity Models by Clustering of Molecules in Descriptor Space
Improvement of Ensemble of Multi-Regression Structure-Toxicity Models by Clustering of Molecules in Descriptor Space // International Conference of Computational Methods in Sciences and Engineering 2008 ; Special Volume of the American Institute of Physics (AIP) - Conference Proceedings of ICCMSE 2008. Vol. 1148 / Simos, Theodore (ur.).
Melville (NY): American Institute of Physics (AIP), 2009. str. 408-411 (poster, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 404027 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Improvement of Ensemble of Multi-Regression Structure-Toxicity Models by Clustering of Molecules in Descriptor Space
Autori
Bašic, Ivan ; Lučić, Bono ; Nikolić, Sonja ; Papeš-Šokčević, Lidija ; Nadramija, Damir
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
International Conference of Computational Methods in Sciences and Engineering 2008 ; Special Volume of the American Institute of Physics (AIP) - Conference Proceedings of ICCMSE 2008. Vol. 1148
/ Simos, Theodore - Melville (NY) : American Institute of Physics (AIP), 2009, 408-411
ISBN
978-0-7354-0685-8
Skup
International Conference of Computational Methods in Sciences and Engineering 2008
Mjesto i datum
Kreta, Grčka, 25.09.2008. - 30.09.2008
Vrsta sudjelovanja
Poster
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
Acute aquatic toxicity; Organic molecules; QSAR models; Molecular descriptors; Distance based similarity; Clustering of molecules; Ensemble of multi-regression models; Clustered ensembles
Sažetak
For selected data set published by Russom et al. (Environ. Toxicol. Chem. 16, 948-967 (1997)) containing 704 organic molecules with measured acute aquatic toxicity data (96-h LC50 tests) we calculated data set of more than 1400 molecular descriptors by the Dragon 5.0 program.[1] After we excluded descriptors that have almost constant values, and those having very low correlation with the logarithm of LC50 values on the training set, about 620 descriptors remained and were used in the modeling process. Data set of molecules was randomly partitioned into the training and test set containing 560 and 144 molecules, respectively. We developed and compared two kinds of ensemble of both linear and nonlinear multi-regression models (1) normal ensembles and (2) ensembles obtained by the clustering of molecules according to their similarity (clustered ensembles). Clustering of molecules was performed by calculating their Euclidian distances in normalized descriptor space. In this method, the final model was developed only on those molecules from the training set that are close (measured using Euclidian distance in normalized descriptor space) to the selected molecule from the test set. Although results obtained by normal ensembles are very good (e.g. nonlinear ensemble of 8-descriptor models ; rtrain = 0.91, strain = 0.54, rtest = 0.76, rtest = 0.80), significant improvement is obtained by taking into account clustering of molecules in development of ensembles of linear models (e.g. 200 3-descriptor models in ensemble: rtrain = 0.91, strain = 0.53, rtest = 0.836, rtest = 0.70 ; or for 200 5-descriptor models in ensemble rtrain = 0.94, strain = 0.45, rtest = 0.84, rtest = 0.70). These results clearly indicate that the use of information about similarity between molecules can improve structure-toxicity models, and we also expect that this could be valid generally.
Izvorni jezik
Engleski
Znanstvena područja
Kemija, Računarstvo
Napomena
Doi:10.1063/1.3225331
POVEZANOST RADA
Projekti:
079-0000000-3211 - Odnos strukture i aktivnosti flavonoida (Amić, Dragan, MZOS ) ( CroRIS)
098-1770495-2919 - Razvoj metoda za modeliranje svojstava bioaktivnih molekula i proteina (Lučić, Bono, MZOS ) ( CroRIS)
Ustanove:
Institut "Ruđer Bošković", Zagreb,
Nastavni zavod za javno zdravstvo "Dr. Andrija Štampar",
PLIVA HRVATSKA d.o.o.
Profili:
Lidija Papeš Šokčević
(autor)
Bono Lučić
(autor)
Damir Nadramija
(autor)
Ivan Bašic
(autor)
Sonja Nikolić
(autor)