Hierarchy Decomposition Pipeline: A Toolbox for Comparison of Model Induction Algorithms on Hierarchical Multi-label Classification Problems (CROSBI ID 732206)
Prilog sa skupa u časopisu | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Vidulin, Vedrana ; Džeroski, Sašo
engleski
Hierarchy Decomposition Pipeline: A Toolbox for Comparison of Model Induction Algorithms on Hierarchical Multi-label Classification Problems
Hierarchical multi-label classification (HMC) is a supervised machine learning task, where each example can be assigned more than one label and the possible labels are organized in a hierarchy. HMC problems emerge in domains like functional genomics, habitat modelling, text and image categorization. They can be addressed with global model induction algorithms, which induce a single model that predicts the complete hierarchy, as well as with local algorithms, which induce multiple models that predict different segments of the hierarchy. However, there is no consensus about which of these approaches perform the best over different domains, especially in the setting of learning ensembles. We introduce the hierarchy decomposition pipeline, a publicly available toolbox for comparison of model induction algorithms on HMC problems in an ensemble setting. The pipeline includes five algorithms, including the algorithm that predicts the complete hierarchy, and algorithms that perform partial and complete hierarchy decompositions. One of these algorithms is the novel “label specialization” algorithm that constructs a local multi-label classification model for each parent label in a hierarchy that simultaneously predicts the respective children labels. We apply the pipeline on ten HMC data sets from four domains, which have both tree and directed acyclic graph label hierarchies, and confirm that there is no single best algorithm for all HMC problems. This finding shows that there exists a need for such a pipeline that enables a user to choose the best performing algorithm for his/her HMC data set. Finally, we show that the choice can be narrowed to a specific type of algorithm, based on the characteristics of the label hierarchy and the data set label cardinality.
Hierarchical multi-label classification, Hierarchy decomposition, Structured prediction
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
486-501.
2020.
nije evidentirano
objavljeno
10.1007/978-3-030-61527-7_32
Podaci o matičnoj publikaciji
Lecture notes in computer science
Appice, A. ; Tsoumakas, G. ; Manolopoulos, Y. ; Matwin, S.
Springer
978-3-030-61526-0
0302-9743
Podaci o skupu
23rd International Conference on Discovery Science (DS 2020)
predavanje
19.10.2020-21.10.2020
online