Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi !

Training a Genre Classifier for Automatic Classification of Web Pages (CROSBI ID 321258)

Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija

Vidulin, Vedrana ; Luštrek, Mitja ; Gams, Matjaž Training a Genre Classifier for Automatic Classification of Web Pages // CIT. Journal of computing and information technology, 15 (2007), 4; 305-311. doi: 10.2498/cit.1001137

Podaci o odgovornosti

Vidulin, Vedrana ; Luštrek, Mitja ; Gams, Matjaž

engleski

Training a Genre Classifier for Automatic Classification of Web Pages

This paper presents experiments on classifying web pages by genre. Firstly, a corpus of 1 539 manually labeled web pages was prepared. Secondly, 502 genre features were selected based on the literature and the observation of the corpus. Thirdly, these features were extracted from the corpus to obtain a data set. Finally, two machine learning algorithms, one for induction of decision trees (J48) and one ensemble algorithm (bagging), were trained and tested on the data set. The ensemble algorithm achieved on average 17% better precision and 1.6% better accuracy, but slightly worse recall ; F-measure did not vary significantly. The results indicate that classification by genre could be a useful addition to search engines.

genre classification, web page, genre features, ensemble algorithm

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o izdanju

15 (4)

2007.

305-311

objavljeno

1330-1136

10.2498/cit.1001137

Povezanost rada

Povezane osobe



nije evidentirano

Poveznice
Indeksiranost