The derivation of formulas for calculation ofcharacteristic values of statistical parametersused in model quality estimation

Bojović, Viktor; Batista, Jadranko; Lučić, Bono

Pregled bibliografske jedinice broj: 1216570

The derivation of formulas for calculation of characteristic values of statistical parameters used in model quality estimation

Bojović, Viktor; Batista, Jadranko; Lučić, Bono

The derivation of formulas for calculation of characteristic values of statistical parameters used in model quality estimation // Book of Abstracts / Vančik, Hrvoj ; Cioslowski, Jerzy ; Namjesnik, Danijel (ur.).
Zagreb: Hrvatsko kemijsko društvo, 2021. str. 23-23 (predavanje, međunarodna recenzija, sažetak, znanstveni)

CROSBI ID: 1216570 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
The derivation of formulas for calculation of characteristic values of statistical parameters used in model quality estimation

Autori
Bojović, Viktor ; Batista, Jadranko ; Lučić, Bono

Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, sažetak, znanstveni

Izvornik
Book of Abstracts / Vančik, Hrvoj ; Cioslowski, Jerzy ; Namjesnik, Danijel - Zagreb : Hrvatsko kemijsko društvo, 2021, 23-23

ISBN
978-953-8334-02-3

Skup
32nd International Course and Conference on the Interfaces among Mathematics, Chemistry and Computer Sciences: Mathematics, Chemistry, Computing (Math/Chem/Comp, MC2-32)

Mjesto i datum
Dubrovnik, Hrvatska, 07.06.2021. - 11.06.2021

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
korelacija ; MCC ; QSP(A)R
(correlation ; MCC ; QSP(A)R)

Sažetak
Recently, a new accuracy parameter was introduced for binary classification models, that can be used for estimating the (real) model contribution which is over the level of random accuracy [1, 2]. The randomization of the Y variable is used as a standard procedure in the model quality analysis in general, but particularly in the modelling of relationships between the structure and property/activity (QSP(A)R) of chemical compounds. A good result of randomization analyses for a model is when its actual accuracy, estimated by a selected statistical measure (parameter) of model quality, is significantly higher than the accuracy obtained by the best-randomized model. In binary classification models, both the experimental variable Y and corresponding model variable Y’ (estimated/predicted by the model) have values of 0 or 1 (inactive/active). The values of each statistical measure (parameters, metrics) of such a model, derived from the comparison of experimental (Y) and model (Y’) variables, can be expressed by four numbers forming a confusion matrix/table. These are TP (true positive) and TN (true negative) representing the total number of correct classification (when the same values are in variables Y and Y’), and FN (false negative) and FP (false positive) that are incorrect predictions. By performing permutation analysis of the values of model variable Y’ keeping variable Y in the original order, we defined the characteristic values of the statistical measure (metric, parameter) as the border values (i.e. minimal and maximal) and the most probable (random) value. We derived the formulae for estimating the characteristic values of accuracy parameter (metric) measuring the percent of correct classification. Also, the usefulness of derived parameters will be illustrated in the analysis of the quality of balanced and non- balanced QSP(A)R models and confirmed by simulation results. Further, the possibility of application of the developed methodology in estimating the complexity of binary classification variables and models will be elaborated and generalized to multi-class problems.

Izvorni jezik
Engleski

Znanstvena područja
Kemija, Biologija, Informacijske i komunikacijske znanosti

Napomena
Participant - Lecturer: Bono Lučić

POVEZANOST RADA

Projekti:
HRZZ-DOK-2018-01-9531 - Bioprospecting Jadranskog mora (Lučić, Bono, HRZZ - 2018-01) ( CroRIS)

Ustanove:
Institut "Ruđer Bošković", Zagreb

Profili:

Bono Lučić (autor)

Viktor Bojović (autor)

mcc.hkd.hr

CROSBI Hrvatska znanstvena bibliografija

Pregled bibliografske jedinice broj: 1216570

The derivation of formulas for calculation of characteristic values of statistical parameters used in model quality estimation

Citiraj ovu publikaciju:

Pregled bibliografske jedinice broj: 1216570

The derivation of formulas for calculation of characteristic values of statistical parameters used in model quality estimation

Citiraj ovu publikaciju:

Podijeli: