Pregled bibliografske jedinice broj: 1216670
Derivation of formulae for calculation of minimal and maximal values of model evaluation metrics and their use in evluation of variable monotonicity
Derivation of formulae for calculation of minimal and maximal values of model evaluation metrics and their use in evluation of variable monotonicity // Math/Chem/Comp 2022 and 33rd MC2 Conference : Book of Abstracts / Vančik, Hrvoj ; Cioslowski, Jerzy ; Namjesnik, Danijel (ur.).
Zagreb: Hrvatsko kemijsko društvo, 2022. str. 22-22 (predavanje, međunarodna recenzija, sažetak, znanstveni)
CROSBI ID: 1216670 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Derivation of formulae for calculation of minimal
and maximal values of model evaluation metrics and
their use in evluation of variable monotonicity
Autori
Viktor, Bojović ; Skala, Karolj ; Lučić, Bono ;
Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, sažetak, znanstveni
Izvornik
Math/Chem/Comp 2022 and 33rd MC2 Conference : Book of Abstracts
/ Vančik, Hrvoj ; Cioslowski, Jerzy ; Namjesnik, Danijel - Zagreb : Hrvatsko kemijsko društvo, 2022, 22-22
ISBN
978-953-8334-03-0
Skup
33rd MC2 Conference (Math/Chem/Comp 2022)
Mjesto i datum
Dubrovnik, Hrvatska, 06.07.2022. - 10.07.2022
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
statistical modeling ; MCC, correlation, Matthews coefficient ; binary variables
Sažetak
In the development of structure-property relationship models and multivariate models in general, there is a tendency to include as few structural variables (molecular descriptors) as possible in the final optimised models. Due to the ubiquitous digitisation of data and molecular fingerprints, molecular descriptors are increasingly binary variables with values of 1 or 0. Even the experimental properties of chemical compounds are expressed in binary values - e.g. toxic (1) or non-toxic (0). We have performed the calculation and simulation of the correspondence between the two binary variables in paired and unpaired sorting cases. These two experiments provide us with the maximum (in the paired sorting case) and minimum possible correspondence (unpaired sorting) of these variables. Theoretically, we have derived formulae for calculating the maximum and minimum agreement between two variables where x-values (first variable) and y-values (second variable) belong to class 1. The difference between the minimum and maximum values reflects the information content or monotonicity between the two variables. If we consider the case x = y, we obtain expressions that measure the monotonicity of a variable with x values in class 1 and (N - x) values in class 0. If the number of elements in class 1 and class 0 is expressed by the values of the error matrix corresponding to a binary classification (TP - true positive, TN - true negative, FN - false negative, FP - false positive ; x = TP + FN, N - x = TN + FP), we obtain general expressions for calculating the maximum and minimum values of any variable measuring the agreement, correlation or error between two classification variables. In addition to being used to evaluate the monotonicity of variables, the results obtained are also used to evaluate the quality of binary classification models.
Izvorni jezik
Engleski
Znanstvena područja
Matematika, Kemija, Biologija, Računarstvo, Informacijske i komunikacijske znanosti
Napomena
Participant - lecturer: Viktor Bojović
POVEZANOST RADA
Projekti:
HRZZ-DOK-2018-01-9531 - Bioprospecting Jadranskog mora (Lučić, Bono, HRZZ - 2018-01) ( CroRIS)
Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb,
Institut "Ruđer Bošković", Zagreb