Pregled bibliografske jedinice broj: 1253068
Multi-label approaches to web genre identification
Multi-label approaches to web genre identification // Journal for language technology and computational linguistics, 24 (2009), 1; 93-110 (međunarodna recenzija, članak, znanstveni)
CROSBI ID: 1253068 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Multi-label approaches to web genre identification
Autori
Vidulin, Vedrana ; Luštrek, Mitja ; Gams, Matjaž
Izvornik
Journal for language technology and computational linguistics (0175-1336) 24
(2009), 1;
93-110
Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni
Ključne riječi
web genre classification, multi-label classification
Sažetak
A web page is a complex document which can share conventions of several genres, or contain several parts, each belonging to a different genre. To properly address the genre interplay, a recent proposal in automatic web genre identification is multi-label classification. The dominant approach to such classification is to transform one multi-label machine learning problem into several sub-problems of learning binary single-label classifiers, one foreach genre. In this paper we explore multi-class transformation, where each combination of genres is labeled with a single distinct label. This approach is then compared to the binary approach to determine which one better captures the multi-label aspect of web genres. Experimental results show that both of the approaches failed to properly address multi-genre web pages. Obtained differences were a result of the variations in the recognition of one-genre web pages.
Izvorni jezik
Engleski