Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi !

Automatic Authorship Attribution for Texts in Croatian Language Using Combinations of Features (CROSBI ID 162818)

Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija

Reicher, Tomislav ; Krišto, Ivan ; Belša, Igor ; Šilić, Artur Automatic Authorship Attribution for Texts in Croatian Language Using Combinations of Features // Lecture notes in computer science, 6277 (2010), 21-30. doi: 10.1007/978-3-642-15390-7

Podaci o odgovornosti

Reicher, Tomislav ; Krišto, Ivan ; Belša, Igor ; Šilić, Artur

engleski

Automatic Authorship Attribution for Texts in Croatian Language Using Combinations of Features

In this work we investigate the use of various character, lexical, and syntactic level features and their combinations in automatic authorship attribution. Since the majority of text representation features are language specific, we examine their application on texts written in Croatian language. Our work differs from the similar work in at least three aspects. Firstly, we use slightly different set of features than previously proposed. Secondly, we use four different data sets and compare the same features across those data sets to draw stronger conclusions. The data sets that we use consist of articles, blogs, books, and forum posts written in Croatian language. Finally, we employ a classification method based on a strong classifier.We use the Support Vector Machines to learn classifiers which achieve excellent results for longer texts: 91% accuracy and F1 measure for blogs, 93% acc. and F1 for articles, and 99% acc. and F1 for books. Experiments conducted on forum posts show that more complex features need to be employed for shorter texts.

author attribution; function words; POS n-grams; feature combinations; SVM

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o izdanju

6277

2010.

21-30

objavljeno

0302-9743

10.1007/978-3-642-15390-7

Povezanost rada

Računarstvo

Poveznice
Indeksiranost