Efficiency of the Adjusted Binary Classification (ABC) Approach in Osteometric Sex Estimation: A Comparative Study of Different Linear Machine Learning Algorithms and Training Sample Sizes

Attia, MennattAllah Hassan; Kholief, Marwa A.; Zaghloul, Nancy M.; Kružić, Ivana; Šimun Anđelinović; Bašić, Željana; Jerković, Ivan

doi:10.3390/biology11060917

Pregled bibliografske jedinice broj: 1200116

Efficiency of the Adjusted Binary Classification (ABC) Approach in Osteometric Sex Estimation: A Comparative Study of Different Linear Machine Learning Algorithms and Training Sample Sizes

Attia, MennattAllah Hassan; Kholief, Marwa A.; Zaghloul, Nancy M.; Kružić, Ivana; Šimun Anđelinović; Bašić, Željana; Jerković, Ivan

Efficiency of the Adjusted Binary Classification (ABC) Approach in Osteometric Sex Estimation: A Comparative Study of Different Linear Machine Learning Algorithms and Training Sample Sizes // Biology, 11 (2022), 6; 917, 18 doi:10.3390/biology11060917 (međunarodna recenzija, članak, znanstveni)

CROSBI ID: 1200116 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Efficiency of the Adjusted Binary Classification (ABC) Approach in Osteometric Sex Estimation: A Comparative Study of Different Linear Machine Learning Algorithms and Training Sample Sizes

Autori
Attia, MennattAllah Hassan ; Kholief, Marwa A. ; Zaghloul, Nancy M. ; Kružić, Ivana ; Šimun Anđelinović ; Bašić, Željana ; Jerković, Ivan

Izvornik
Biology (2079-7737) 11 (2022), 6; 917, 18

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
machine learning algorithms ; adjusted binary classification ; osteometric sex estimation ; optimal training sample size

Sažetak
The adjusted binary classification (ABC) approach was proposed to assure that the binary classification model reaches a particular accuracy level. The present study evaluated the ABC for osteometric sex classification using multiple machine learning (ML) techniques: linear discriminant analysis (LDA), boosted generalized linear model (GLMB), support vector machine (SVM), and logistic regression (LR). We used 13 femoral measurements of 300 individuals from a modern Turkish population sample and split data into two sets: training (n = 240) and testing (n = 60). Then, the five best-performing measurements were selected for training univariate models, while pools of these variables were used for the multivariable models. ML classifier type did not affect the performance of unadjusted models. The accuracy of univariate models was 82–87%, while that of multivariate models was 89–90%. After applying ABC to the crossvalidation set, the accuracy and the positive and negative predictive values for uni- and multivariate models were ≥95%. Sex could be estimated for 28–75% of individuals using univariate models but with an obvious sexing bias, likely caused by different degrees of sexual dimorphism and between-group overlap. However, using multivariate models, we minimized the bias and properly classified 81–87% of individuals. A similar performance was also noted in the testing sample (except for FEB), with accuracies of 96–100%, and a proportion of classified individuals between 30% and 82% in univariate models, and between 90% and 91% in multivariate models. When considering different training sample sizes, we demonstrated that LR was the most sensitive with limited sample sizes (n < 150), while GLMB was the most stable classifier.

Izvorni jezik
Hrvatski

Znanstvena područja
Temeljne medicinske znanosti, Sigurnosne i obrambene znanosti, Etnologija i antropologija, Kognitivna znanost (prirodne, tehničke, biomedicina i zdravstvo, društvene i humanističke znanosti)

POVEZANOST RADA

Ustanove:
KBC Split,
Medicinski fakultet, Split,
Sveučilište u Splitu Sveučilišni odjel za forenzične znanosti

Profili:

Ivana Kružić (autor)