Pregled bibliografske jedinice broj: 1216570
The derivation of formulas for calculation of characteristic values of statistical parameters used in model quality estimation
The derivation of formulas for calculation of characteristic values of statistical parameters used in model quality estimation // Book of Abstracts / Vančik, Hrvoj ; Cioslowski, Jerzy ; Namjesnik, Danijel (ur.).
Zagreb: Hrvatsko kemijsko društvo, 2021. str. 23-23 (predavanje, međunarodna recenzija, sažetak, znanstveni)
CROSBI ID: 1216570 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
The derivation of formulas for calculation of
characteristic values of statistical parameters
used in model quality estimation
Autori
Bojović, Viktor ; Batista, Jadranko ; Lučić, Bono
Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, sažetak, znanstveni
Izvornik
Book of Abstracts
/ Vančik, Hrvoj ; Cioslowski, Jerzy ; Namjesnik, Danijel - Zagreb : Hrvatsko kemijsko društvo, 2021, 23-23
ISBN
978-953-8334-02-3
Skup
32nd International Course and Conference on the Interfaces among Mathematics, Chemistry and Computer Sciences: Mathematics, Chemistry, Computing (Math/Chem/Comp, MC2-32)
Mjesto i datum
Dubrovnik, Hrvatska, 07.06.2021. - 11.06.2021
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
korelacija ; MCC ; QSP(A)R
(correlation ; MCC ; QSP(A)R)
Sažetak
Recently, a new accuracy parameter was introduced for binary classification models, that can be used for estimating the (real) model contribution which is over the level of random accuracy [1, 2]. The randomization of the Y variable is used as a standard procedure in the model quality analysis in general, but particularly in the modelling of relationships between the structure and property/activity (QSP(A)R) of chemical compounds. A good result of randomization analyses for a model is when its actual accuracy, estimated by a selected statistical measure (parameter) of model quality, is significantly higher than the accuracy obtained by the best-randomized model. In binary classification models, both the experimental variable Y and corresponding model variable Y’ (estimated/predicted by the model) have values of 0 or 1 (inactive/active). The values of each statistical measure (parameters, metrics) of such a model, derived from the comparison of experimental (Y) and model (Y’) variables, can be expressed by four numbers forming a confusion matrix/table. These are TP (true positive) and TN (true negative) representing the total number of correct classification (when the same values are in variables Y and Y’), and FN (false negative) and FP (false positive) that are incorrect predictions. By performing permutation analysis of the values of model variable Y’ keeping variable Y in the original order, we defined the characteristic values of the statistical measure (metric, parameter) as the border values (i.e. minimal and maximal) and the most probable (random) value. We derived the formulae for estimating the characteristic values of accuracy parameter (metric) measuring the percent of correct classification. Also, the usefulness of derived parameters will be illustrated in the analysis of the quality of balanced and non- balanced QSP(A)R models and confirmed by simulation results. Further, the possibility of application of the developed methodology in estimating the complexity of binary classification variables and models will be elaborated and generalized to multi-class problems.
Izvorni jezik
Engleski
Znanstvena područja
Kemija, Biologija, Informacijske i komunikacijske znanosti
Napomena
Participant - Lecturer: Bono Lučić
POVEZANOST RADA
Projekti:
HRZZ-DOK-2018-01-9531 - Bioprospecting Jadranskog mora (Lučić, Bono, HRZZ - 2018-01) ( CroRIS)
Ustanove:
Institut "Ruđer Bošković", Zagreb