The derivation of formulas for calculation of characteristic values of statistical parameters used in model quality estimation

Bojović, Viktor; Batista, Jadranko; Lučić, Bono

izvor podataka: crosbi ✓

The derivation of formulas for calculation of characteristic values of statistical parameters used in model quality estimation (CROSBI ID 723357)

Prilog sa skupa u zborniku | sažetak izlaganja sa skupa | međunarodna recenzija

Bojović, Viktor ; Batista, Jadranko ; Lučić, Bono The derivation of formulas for calculation of characteristic values of statistical parameters used in model quality estimation // Book of Abstracts / Vančik, Hrvoj ; Cioslowski, Jerzy ; Namjesnik, Danijel (ur.). Zagreb: Hrvatsko kemijsko društvo, 2021. str. 23-23

Podaci o odgovornosti

Autori

Bojović, Viktor ; Batista, Jadranko ; Lučić, Bono

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

The derivation of formulas for calculation of characteristic values of statistical parameters used in model quality estimation

Sažetak

Recently, a new accuracy parameter was introduced for binary classification models, that can be used for estimating the (real) model contribution which is over the level of random accuracy [1, 2]. The randomization of the Y variable is used as a standard procedure in the model quality analysis in general, but particularly in the modelling of relationships between the structure and property/activity (QSP(A)R) of chemical compounds. A good result of randomization analyses for a model is when its actual accuracy, estimated by a selected statistical measure (parameter) of model quality, is significantly higher than the accuracy obtained by the best-randomized model. In binary classification models, both the experimental variable Y and corresponding model variable Y’ (estimated/predicted by the model) have values of 0 or 1 (inactive/active). The values of each statistical measure (parameters, metrics) of such a model, derived from the comparison of experimental (Y) and model (Y’) variables, can be expressed by four numbers forming a confusion matrix/table. These are TP (true positive) and TN (true negative) representing the total number of correct classification (when the same values are in variables Y and Y’), and FN (false negative) and FP (false positive) that are incorrect predictions. By performing permutation analysis of the values of model variable Y’ keeping variable Y in the original order, we defined the characteristic values of the statistical measure (metric, parameter) as the border values (i.e. minimal and maximal) and the most probable (random) value. We derived the formulae for estimating the characteristic values of accuracy parameter (metric) measuring the percent of correct classification. Also, the usefulness of derived parameters will be illustrated in the analysis of the quality of balanced and non- balanced QSP(A)R models and confirmed by simulation results. Further, the possibility of application of the developed methodology in estimating the complexity of binary classification variables and models will be elaborated and generalized to multi-class problems.

Ključne riječi

correlation ; MCC ; QSP(A)R

Napomena

Participant - Lecturer: Bono Lučić

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o prilogu

Stranice rada

23-23.

Godina izdavanja

2021.

Status objave rada

objavljeno

Podaci o matičnoj publikaciji

Naslov

Book of Abstracts

Urednici

Vančik, Hrvoj ; Cioslowski, Jerzy ; Namjesnik, Danijel

Izdavač

Zagreb: Hrvatsko kemijsko društvo

ISBN

978-953-8334-02-3

Podaci o skupu

Skup

32nd International Course and Conference on the Interfaces among Mathematics, Chemistry and Computer Sciences: Mathematics, Chemistry, Computing (Math/Chem/Comp, MC2-32)

Vrsta sudjelovanja

predavanje

Datum održavanja skupa

07.06.2021-12.06.2021

Mjesto održavanja skupa

Dubrovnik, Hrvatska

Povezanost rada

Povezane osobe

Viktor Bojović (autor/i)

Bono Lučić (autor/i)

Povezane ustanove

Institut Ruđer Bošković (098) (autorova ustanova)

Povezani projekti

Bioprospecting Jadranskog mora (rezultat rada na projektu)

Područje

Biologija, Informacijske i komunikacijske znanosti, Kemija

Poveznice

mcc.hkd.hr