Hierarchy Decomposition Pipeline: A Toolbox for Comparison of Model Induction Algorithms on Hierarchical Multi-label Classification Problems

Vidulin, Vedrana; Džeroski, Sašo

izvor podataka: crosbi !

Hierarchy Decomposition Pipeline: A Toolbox for Comparison of Model Induction Algorithms on Hierarchical Multi-label Classification Problems (CROSBI ID 732206)

Prilog sa skupa u časopisu | izvorni znanstveni rad | međunarodna recenzija

Vidulin, Vedrana ; Džeroski, Sašo Hierarchy Decomposition Pipeline: A Toolbox for Comparison of Model Induction Algorithms on Hierarchical Multi-label Classification Problems // Lecture notes in computer science / Appice, A. ; Tsoumakas, G. ; Manolopoulos, Y. et al. (ur.). 2020. str. 486-501 doi: 10.1007/978-3-030-61527-7_32

Podaci o odgovornosti

Autori

Vidulin, Vedrana ; Džeroski, Sašo

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

Hierarchy Decomposition Pipeline: A Toolbox for Comparison of Model Induction Algorithms on Hierarchical Multi-label Classification Problems

Sažetak

Hierarchical multi-label classification (HMC) is a supervised machine learning task, where each example can be assigned more than one label and the possible labels are organized in a hierarchy. HMC problems emerge in domains like functional genomics, habitat modelling, text and image categorization. They can be addressed with global model induction algorithms, which induce a single model that predicts the complete hierarchy, as well as with local algorithms, which induce multiple models that predict different segments of the hierarchy. However, there is no consensus about which of these approaches perform the best over different domains, especially in the setting of learning ensembles. We introduce the hierarchy decomposition pipeline, a publicly available toolbox for comparison of model induction algorithms on HMC problems in an ensemble setting. The pipeline includes five algorithms, including the algorithm that predicts the complete hierarchy, and algorithms that perform partial and complete hierarchy decompositions. One of these algorithms is the novel “label specialization” algorithm that constructs a local multi-label classification model for each parent label in a hierarchy that simultaneously predicts the respective children labels. We apply the pipeline on ten HMC data sets from four domains, which have both tree and directed acyclic graph label hierarchies, and confirm that there is no single best algorithm for all HMC problems. This finding shows that there exists a need for such a pipeline that enables a user to choose the best performing algorithm for his/her HMC data set. Finally, we show that the choice can be narrowed to a specific type of algorithm, based on the characteristics of the label hierarchy and the data set label cardinality.

Ključne riječi

Hierarchical multi-label classification, Hierarchy decomposition, Structured prediction

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o prilogu

Stranice rada

486-501.

Godina izdavanja

2020.

Volumen (broj)

nije evidentirano

Status objave rada

objavljeno

DOI

10.1007/978-3-030-61527-7_32

Podaci o matičnoj publikaciji

Naslov

Lecture notes in computer science

Urednici

Appice, A. ; Tsoumakas, G. ; Manolopoulos, Y. ; Matwin, S.

Izdavač

Springer

ISBN

978-3-030-61526-0

ISSN

0302-9743

Podaci o skupu

Skup

23rd International Conference on Discovery Science (DS 2020)

Vrsta sudjelovanja

predavanje

Datum održavanja skupa

19.10.2020-21.10.2020

Mjesto održavanja skupa

online

Povezanost rada

Povezane osobe

Vedrana Vidulin (autor/i)

Područje

nije evidentirano

Poveznice

doi.org

Indeksiranost

Scopus