Effect of information leakage and method of splitting (rational and random) on external predictive ability and behavior of different statistical parameters of QSAR model (CROSBI ID 268327)
Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Masand, Vijay H. ; Mahajan, Devidas T. ; Nazeruddin, Gulam M. ; Ben Hadda, Taibi ; Rastija, Vesna ; Alfeefy, Ahmed M.
engleski
Effect of information leakage and method of splitting (rational and random) on external predictive ability and behavior of different statistical parameters of QSAR model
Quantitative Structure-Activity Relationship not only provides guidelines regarding structural features responsible for biological activity but it can be used also for prediction of desired activity prior to synthesis of untested chemicals. Therefore, an appropriate validation of any QSAR is of utmost importance to judge its external predictive ability. Generally, internal and external validations (preferred by many) are used in the absence of a true external dataset. The model developed using external method may not be reliable as it may not capture all essential features required for the particular SAR due to omission of some compounds, especially for small datasets. In external validation, the splitting is done either rationally or in random manner before descriptor selection. In the present study, rational splitting of dataset was performed using a novel method and its effect on statistical parameters was analyzed. The analysis reveals that the predictive ability of a QSAR model is sensitive toward (1) the method of splitting and (2) distribution of the training and the prediction sets. In addition, purposeful selection can be used to influence the statistical parameters ; therefore, external validation based on single split is insufficient to guarantee the true predictive ability of a QSAR model. Besides, it appears that the selection of descriptors prior to splitting (information leakage) has little role to play in deciding external predictivity of the model. The present study reveals that as many as possible statistical parameters should be examined along with boot-strapping instead of single external validation.
QSAR ; external validation ; statistical parameters ; splitting methods ; predictivity
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o izdanju
24 (3)
2015.
1241-1264
objavljeno
1054-2523
1554-8120
10.1007/s00044-014-1193-8
Povezanost rada
Biotehnologija, Poljoprivreda (agronomija)