Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi !

Web genre classification with methods for structured output prediction (CROSBI ID 320949)

Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija

Madjarov, Gjorgji ; Vidulin, Vedrana ; Dimitrovski, Ivica ; Kocev, Dragi Web genre classification with methods for structured output prediction // Information sciences, 503 (2019), 551-573. doi: 10.1016/j.ins.2019.07.009

Podaci o odgovornosti

Madjarov, Gjorgji ; Vidulin, Vedrana ; Dimitrovski, Ivica ; Kocev, Dragi

engleski

Web genre classification with methods for structured output prediction

The increase of the number of web pages prompts for improvement of the search engines. One such improvement is specifying the desired web genre of the resulting web pages. The prediction of web genres triggers expectations about the type of information contained in a given web page. More specifically, web genres can be seen as textual categories such as scientific papers, home pages or eshops. Arguably, in the context of web search, specifying genre beside topical keywords enables a user to easily find a scientific paper (genre) about text mining (topic). Typically, web genre prediction is treated as a predictive modelling task of multi-class classification, with some recent studies advocating the introduction of a structure in the output space: either by considering multiple web genres per web page or exploiting a hierarchy of web genres. We investigate the structuring of the output space by constructing hierarchies using data-driven methods, experts or even randomly. We also use 10 different representations of the web pages. We use predictive clustering trees and ensembles thereof to properly assess the influence of the different information sources. The experimental evaluation is performed on two benchmark corpora: 20-genre and SANTINIS-ML. The results reveal that exploiting a hierarchy of web genres yields best predictive performance across both datasets, all predictive models, all feature sets and all hierarchies. Next, data-driven hierarchy construction is at least as good as expert- constructed hierarchy with the added value that the hierarchy construction is automatic and fast. Furthermore, ensembles offer state-of-the-art predictive performance and they have a superior performance than single tree models.

Web genre classification, Hierarchy construction, Hierarchical multi-label classification

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o izdanju

503

2019.

551-573

objavljeno

0020-0255

10.1016/j.ins.2019.07.009

Povezanost rada

Povezane osobe



nije evidentirano

Poveznice
Indeksiranost