Pregled bibliografske jedinice broj: 909804
Phenotype prediction with semi-supervised learning
Phenotype prediction with semi-supervised learning // New frontiers in mining complex patterns NFMCP 2017, Lecture Notes in Computer Science
Skopje, Sjeverna Makedonija, 2017. str. 1-11 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 909804 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Phenotype prediction with semi-supervised learning
Autori
Levatić, Jurica ; Brbić, Maria ; Stepišnik Perdih, Tomaž ; Kocev, Dragi ; Vidulin, Vedrana ; Šmuc, Tomislav ; Supek, Fran ; Džeroski, Sašo
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
New frontiers in mining complex patterns NFMCP 2017, Lecture Notes in Computer Science
/ - , 2017, 1-11
Skup
New frontiers in mining complex patterns: Sixth edition of the International Workshop NFMCP 2017 in conjunction with ECML-PKDD 2017
Mjesto i datum
Skopje, Sjeverna Makedonija, 18.09.2017. - 22.09.2017
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
semi-supervised learning ; phenotype ; decision trees ; predictive clustering trees ; random forests ; binary classification
Sažetak
In this work, we address the task of phenotypic traits prediction using methods for semi- supervised learning. More specifically, we propose to use supervised and semi-supervised classification trees as well as supervised and semi-supervised random forests of classification trees. We consider 114 datasets for different phenotypic traits referring to 997 microbial species. These datasets present a challenge for the existing machine learning methods: they are not labelled/annotated entirely and their distribution is typically imbalanced. We investigate whether approaching the task of phenotype prediction as a semi- supervised learning task can yield improved predictive performance. The result suggest that the semi-supervised methodology considered here is helpful for phenotype prediction for which the amount of labeled data ranges from 20 to 40%. Furthermore, the semi-supervised classification trees exhibit good predictive performance for datasets where the presence of a given trait is not extremely imbalanced (i.e., less than 6%).
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo
POVEZANOST RADA
Ustanove:
Institut "Ruđer Bošković", Zagreb