Derivation of formulae for calculation of minimal and maximal values of model evaluation metrics and their use in evluation of variable monotonicity (CROSBI ID 723382)
Prilog sa skupa u zborniku | sažetak izlaganja sa skupa | međunarodna recenzija
Podaci o odgovornosti
Viktor, Bojović ; Skala, Karolj ; Lučić, Bono ;
engleski
Derivation of formulae for calculation of minimal and maximal values of model evaluation metrics and their use in evluation of variable monotonicity
In the development of structure-property relationship models and multivariate models in general, there is a tendency to include as few structural variables (molecular descriptors) as possible in the final optimised models. Due to the ubiquitous digitisation of data and molecular fingerprints, molecular descriptors are increasingly binary variables with values of 1 or 0. Even the experimental properties of chemical compounds are expressed in binary values - e.g. toxic (1) or non-toxic (0). We have performed the calculation and simulation of the correspondence between the two binary variables in paired and unpaired sorting cases. These two experiments provide us with the maximum (in the paired sorting case) and minimum possible correspondence (unpaired sorting) of these variables. Theoretically, we have derived formulae for calculating the maximum and minimum agreement between two variables where x-values (first variable) and y-values (second variable) belong to class 1. The difference between the minimum and maximum values reflects the information content or monotonicity between the two variables. If we consider the case x = y, we obtain expressions that measure the monotonicity of a variable with x values in class 1 and (N - x) values in class 0. If the number of elements in class 1 and class 0 is expressed by the values of the error matrix corresponding to a binary classification (TP - true positive, TN - true negative, FN - false negative, FP - false positive ; x = TP + FN, N - x = TN + FP), we obtain general expressions for calculating the maximum and minimum values of any variable measuring the agreement, correlation or error between two classification variables. In addition to being used to evaluate the monotonicity of variables, the results obtained are also used to evaluate the quality of binary classification models.
statistical modeling ; MCC, correlation, Matthews coefficient ; binary variables
Participant - lecturer: Viktor Bojović
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
22-22.
2022.
objavljeno
Podaci o matičnoj publikaciji
Math/Chem/Comp 2022 and 33rd MC2 Conference : Book of Abstracts
Vančik, Hrvoj ; Cioslowski, Jerzy ; Namjesnik, Danijel
Zagreb: Hrvatsko kemijsko društvo
978-953-8334-03-0
Podaci o skupu
33rd MC2 Conference (Math/Chem/Comp 2022)
predavanje
06.07.2022-10.07.2022
Dubrovnik, Hrvatska
Povezanost rada
Biologija, Informacijske i komunikacijske znanosti, Kemija, Matematika, Računarstvo