The derivation of formulas for calculation of characteristic values of statistical parameters used in model quality estimation (CROSBI ID 723357)
Prilog sa skupa u zborniku | sažetak izlaganja sa skupa | međunarodna recenzija
Podaci o odgovornosti
Bojović, Viktor ; Batista, Jadranko ; Lučić, Bono
engleski
The derivation of formulas for calculation of characteristic values of statistical parameters used in model quality estimation
Recently, a new accuracy parameter was introduced for binary classification models, that can be used for estimating the (real) model contribution which is over the level of random accuracy [1, 2]. The randomization of the Y variable is used as a standard procedure in the model quality analysis in general, but particularly in the modelling of relationships between the structure and property/activity (QSP(A)R) of chemical compounds. A good result of randomization analyses for a model is when its actual accuracy, estimated by a selected statistical measure (parameter) of model quality, is significantly higher than the accuracy obtained by the best-randomized model. In binary classification models, both the experimental variable Y and corresponding model variable Y’ (estimated/predicted by the model) have values of 0 or 1 (inactive/active). The values of each statistical measure (parameters, metrics) of such a model, derived from the comparison of experimental (Y) and model (Y’) variables, can be expressed by four numbers forming a confusion matrix/table. These are TP (true positive) and TN (true negative) representing the total number of correct classification (when the same values are in variables Y and Y’), and FN (false negative) and FP (false positive) that are incorrect predictions. By performing permutation analysis of the values of model variable Y’ keeping variable Y in the original order, we defined the characteristic values of the statistical measure (metric, parameter) as the border values (i.e. minimal and maximal) and the most probable (random) value. We derived the formulae for estimating the characteristic values of accuracy parameter (metric) measuring the percent of correct classification. Also, the usefulness of derived parameters will be illustrated in the analysis of the quality of balanced and non- balanced QSP(A)R models and confirmed by simulation results. Further, the possibility of application of the developed methodology in estimating the complexity of binary classification variables and models will be elaborated and generalized to multi-class problems.
correlation ; MCC ; QSP(A)R
Participant - Lecturer: Bono Lučić
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
23-23.
2021.
objavljeno
Podaci o matičnoj publikaciji
Book of Abstracts
Vančik, Hrvoj ; Cioslowski, Jerzy ; Namjesnik, Danijel
Zagreb: Hrvatsko kemijsko društvo
978-953-8334-02-3
Podaci o skupu
32nd International Course and Conference on the Interfaces among Mathematics, Chemistry and Computer Sciences: Mathematics, Chemistry, Computing (Math/Chem/Comp, MC2-32)
predavanje
07.06.2021-12.06.2021
Dubrovnik, Hrvatska