Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 463884

Automatic Authorship Attribution for Texts in Croatian Language Using Combinations of Features


Reicher, Tomislav; Krišto, Ivan; Belša, Igor; Šilić, Artur
Automatic Authorship Attribution for Texts in Croatian Language Using Combinations of Features // Lecture Notes in Computer Science, 6277 (2010), 21-30 doi:10.1007/978-3-642-15390-7 (međunarodna recenzija, članak, znanstveni)


CROSBI ID: 463884 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Automatic Authorship Attribution for Texts in Croatian Language Using Combinations of Features

Autori
Reicher, Tomislav ; Krišto, Ivan ; Belša, Igor ; Šilić, Artur

Izvornik
Lecture Notes in Computer Science (0302-9743) 6277 (2010); 21-30

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
author attribution; function words; POS n-grams; feature combinations; SVM

Sažetak
In this work we investigate the use of various character, lexical, and syntactic level features and their combinations in automatic authorship attribution. Since the majority of text representation features are language specific, we examine their application on texts written in Croatian language. Our work differs from the similar work in at least three aspects. Firstly, we use slightly different set of features than previously proposed. Secondly, we use four different data sets and compare the same features across those data sets to draw stronger conclusions. The data sets that we use consist of articles, blogs, books, and forum posts written in Croatian language. Finally, we employ a classification method based on a strong classifier.We use the Support Vector Machines to learn classifiers which achieve excellent results for longer texts: 91% accuracy and F1 measure for blogs, 93% acc. and F1 for articles, and 99% acc. and F1 for books. Experiments conducted on forum posts show that more complex features need to be employed for shorter texts.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo



POVEZANOST RADA


Projekti:
036-1300646-1986 - Otkrivanje znanja u tekstnim podacima (Dalbelo-Bašić, Bojana, MZO ) ( CroRIS)

Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb

Profili:

Avatar Url Artur Šilić (autor)

Poveznice na cjeloviti tekst rada:

doi www.springerlink.com

Citiraj ovu publikaciju:

Reicher, Tomislav; Krišto, Ivan; Belša, Igor; Šilić, Artur
Automatic Authorship Attribution for Texts in Croatian Language Using Combinations of Features // Lecture Notes in Computer Science, 6277 (2010), 21-30 doi:10.1007/978-3-642-15390-7 (međunarodna recenzija, članak, znanstveni)
Reicher, T., Krišto, I., Belša, I. & Šilić, A. (2010) Automatic Authorship Attribution for Texts in Croatian Language Using Combinations of Features. Lecture Notes in Computer Science, 6277, 21-30 doi:10.1007/978-3-642-15390-7.
@article{article, author = {Reicher, Tomislav and Kri\v{s}to, Ivan and Bel\v{s}a, Igor and \v{S}ili\'{c}, Artur}, year = {2010}, pages = {21-30}, DOI = {10.1007/978-3-642-15390-7}, keywords = {author attribution, function words, POS n-grams, feature combinations, SVM}, journal = {Lecture Notes in Computer Science}, doi = {10.1007/978-3-642-15390-7}, volume = {6277}, issn = {0302-9743}, title = {Automatic Authorship Attribution for Texts in Croatian Language Using Combinations of Features}, keyword = {author attribution, function words, POS n-grams, feature combinations, SVM} }
@article{article, author = {Reicher, Tomislav and Kri\v{s}to, Ivan and Bel\v{s}a, Igor and \v{S}ili\'{c}, Artur}, year = {2010}, pages = {21-30}, DOI = {10.1007/978-3-642-15390-7}, keywords = {author attribution, function words, POS n-grams, feature combinations, SVM}, journal = {Lecture Notes in Computer Science}, doi = {10.1007/978-3-642-15390-7}, volume = {6277}, issn = {0302-9743}, title = {Automatic Authorship Attribution for Texts in Croatian Language Using Combinations of Features}, keyword = {author attribution, function words, POS n-grams, feature combinations, SVM} }

Časopis indeksira:


  • Scopus


Uključenost u ostale bibliografske baze podataka::


  • Compu-Math Citation Index
  • Science Citation Index Expanded


Citati:





    Contrast
    Increase Font
    Decrease Font
    Dyslexic Font