Phenotype prediction with semi-supervised learning

Levatić, Jurica; Brbić, Maria; Stepišnik Perdih, Tomaž; Kocev, Dragi; Vidulin, Vedrana; Šmuc, Tomislav; Supek, Fran; Džeroski, Sašo

Pregled bibliografske jedinice broj: 909804

Phenotype prediction with semi-supervised learning

Levatić, Jurica; Brbić, Maria; Stepišnik Perdih, Tomaž; Kocev, Dragi; Vidulin, Vedrana; Šmuc, Tomislav; Supek, Fran; Džeroski, Sašo

Phenotype prediction with semi-supervised learning // New frontiers in mining complex patterns NFMCP 2017, Lecture Notes in Computer Science
Skopje, Sjeverna Makedonija, 2017. str. 1-11 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)

CROSBI ID: 909804 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Phenotype prediction with semi-supervised learning

Autori
Levatić, Jurica ; Brbić, Maria ; Stepišnik Perdih, Tomaž ; Kocev, Dragi ; Vidulin, Vedrana ; Šmuc, Tomislav ; Supek, Fran ; Džeroski, Sašo

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
New frontiers in mining complex patterns NFMCP 2017, Lecture Notes in Computer Science / - , 2017, 1-11

Skup
New frontiers in mining complex patterns: Sixth edition of the International Workshop NFMCP 2017 in conjunction with ECML-PKDD 2017

Mjesto i datum
Skopje, Sjeverna Makedonija, 18.09.2017. - 22.09.2017

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
semi-supervised learning ; phenotype ; decision trees ; predictive clustering trees ; random forests ; binary classification

Sažetak
In this work, we address the task of phenotypic traits prediction using methods for semi- supervised learning. More specifically, we propose to use supervised and semi-supervised classification trees as well as supervised and semi-supervised random forests of classification trees. We consider 114 datasets for different phenotypic traits referring to 997 microbial species. These datasets present a challenge for the existing machine learning methods: they are not labelled/annotated entirely and their distribution is typically imbalanced. We investigate whether approaching the task of phenotype prediction as a semi- supervised learning task can yield improved predictive performance. The result suggest that the semi-supervised methodology considered here is helpful for phenotype prediction for which the amount of labeled data ranges from 20 to 40%. Furthermore, the semi-supervised classification trees exhibit good predictive performance for datasets where the presence of a given trait is not extremely imbalanced (i.e., less than 6%).

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo

POVEZANOST RADA

Ustanove:
Institut "Ruđer Bošković", Zagreb

Profili:

Vedrana Vidulin (autor)