Pregled bibliografske jedinice broj: 212679
Nonlinear Multivariate Polynomial Ensembles in QSAR/QSPR
Nonlinear Multivariate Polynomial Ensembles in QSAR/QSPR // Proceedings of the International Conference of Computational Methods in Sciences and Engineering 2005 (ICCMSE 2005 / Simos, Theodore E. (ur.).
Atena: World Scientific Publishing, 2005. str. - (poster, međunarodna recenzija, sažetak, ostalo)
CROSBI ID: 212679 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Nonlinear Multivariate Polynomial Ensembles in QSAR/QSPR
Autori
Lučić, Bono ; Damir, Nadramija
Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, sažetak, ostalo
Izvornik
Proceedings of the International Conference of Computational Methods in Sciences and Engineering 2005 (ICCMSE 2005
/ Simos, Theodore E. - Atena : World Scientific Publishing, 2005
Skup
International Conference of Computational Methods in Sciences and Engineering 2005
Mjesto i datum
Atena, Grčka, 21.10.2005. - 26.10.2005
Vrsta sudjelovanja
Poster
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
QSAR/QSPR modeling; selection of the most relevant molecular descriptors; ensembles of multivariate regression models; linear and nonlinear models
Sažetak
In this study we demonstrate use of ensembles of linear and nonlinear multivariate regression models, based on multivariate polynomials of initial descriptors, in QSAR/QSPR modeling. Data sets, which varied significantly in size regarding number of variables and number of points, were all previously referenced in literature and molecular structures were either obtained from authors of these publications or generated in our laboratories. All data sets were encoded as SMILES and converted to 3D structures (SD files) by the CORINA program (www2.chemie.uni-erlangen.de/software/corina/). All descriptors were computed by the program DRAGON 2.1 (http://www.disat.unimib.it/chm/). Linear ensembles were built with multiple linear regression models (MLR) and nonlinear ensembles consisted of multivariate polynomials, which were constructed as controlled subsets selected among linear descriptors, their two-fold cross-products and squares, as well as cubic potencies of (only) single descriptors. Ensemble responses were computed as mean or median or weighted values of all intrinsic models. Models and ensembles discussed in this paper were constructed with the application NQSAR, a Windows console application, which is available upon request. Results obtained show clear advantage of nonlinear ensembles over linear counterparts when data sets contain 4 to 5 times more points than model coefficients. On the other side linear ensembles, which in general exhibit higher robustness and stability, are better suited for small data sets with many variables outperforming nonlinear ensembles in predicting values of data points from external data set. This can be explained by the fact that the linear models are less affected by small variations than nonlinear models while they equally benefit from the key ensemble features. Primarily, we note the impact of the inclusion of more variables spread across optimized variable subsets, which are used in ensembles’ intrinsic models that individually satisfy before mentioned rule on over-fitting. The overall ensemble responses are more stable and robust with higher predictive powers than single models.
Izvorni jezik
Engleski
Znanstvena područja
Kemija
POVEZANOST RADA