Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi

The derivation of formulas for calculation of characteristic values of statistical parameters used in model quality estimation (CROSBI ID 723357)

Prilog sa skupa u zborniku | sažetak izlaganja sa skupa | međunarodna recenzija

Bojović, Viktor ; Batista, Jadranko ; Lučić, Bono The derivation of formulas for calculation of characteristic values of statistical parameters used in model quality estimation // Book of Abstracts / Vančik, Hrvoj ; Cioslowski, Jerzy ; Namjesnik, Danijel (ur.). Zagreb: Hrvatsko kemijsko društvo, 2021. str. 23-23

Podaci o odgovornosti

Bojović, Viktor ; Batista, Jadranko ; Lučić, Bono

engleski

The derivation of formulas for calculation of characteristic values of statistical parameters used in model quality estimation

Recently, a new accuracy parameter was introduced for binary classification models, that can be used for estimating the (real) model contribution which is over the level of random accuracy [1, 2]. The randomization of the Y variable is used as a standard procedure in the model quality analysis in general, but particularly in the modelling of relationships between the structure and property/activity (QSP(A)R) of chemical compounds. A good result of randomization analyses for a model is when its actual accuracy, estimated by a selected statistical measure (parameter) of model quality, is significantly higher than the accuracy obtained by the best-randomized model. In binary classification models, both the experimental variable Y and corresponding model variable Y’ (estimated/predicted by the model) have values of 0 or 1 (inactive/active). The values of each statistical measure (parameters, metrics) of such a model, derived from the comparison of experimental (Y) and model (Y’) variables, can be expressed by four numbers forming a confusion matrix/table. These are TP (true positive) and TN (true negative) representing the total number of correct classification (when the same values are in variables Y and Y’), and FN (false negative) and FP (false positive) that are incorrect predictions. By performing permutation analysis of the values of model variable Y’ keeping variable Y in the original order, we defined the characteristic values of the statistical measure (metric, parameter) as the border values (i.e. minimal and maximal) and the most probable (random) value. We derived the formulae for estimating the characteristic values of accuracy parameter (metric) measuring the percent of correct classification. Also, the usefulness of derived parameters will be illustrated in the analysis of the quality of balanced and non- balanced QSP(A)R models and confirmed by simulation results. Further, the possibility of application of the developed methodology in estimating the complexity of binary classification variables and models will be elaborated and generalized to multi-class problems.

correlation ; MCC ; QSP(A)R

Participant - Lecturer: Bono Lučić

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o prilogu

23-23.

2021.

objavljeno

Podaci o matičnoj publikaciji

Book of Abstracts

Vančik, Hrvoj ; Cioslowski, Jerzy ; Namjesnik, Danijel

Zagreb: Hrvatsko kemijsko društvo

978-953-8334-02-3

Podaci o skupu

32nd International Course and Conference on the Interfaces among Mathematics, Chemistry and Computer Sciences: Mathematics, Chemistry, Computing (Math/Chem/Comp, MC2-32)

predavanje

07.06.2021-12.06.2021

Dubrovnik, Hrvatska

Povezanost rada

Biologija, Informacijske i komunikacijske znanosti, Kemija

Poveznice