Combining Part-of-Speech Tagger and Inflectional Lexicon for Croatian

Agić, Željko; Tadić, Marko; Dovedan, Zdravko

Pregled bibliografske jedinice broj: 363913

Combining Part-of-Speech Tagger and Inflectional Lexicon for Croatian

Agić, Željko; Tadić, Marko; Dovedan, Zdravko

Combining Part-of-Speech Tagger and Inflectional Lexicon for Croatian // Proceedings of the 6th Language Technologies Conference / Erjavec, Tomaž ; Žganec Gros, Jerneja (ur.).
Ljubljana: Institut Jožef Stefan, 2008. str. 116-121 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)

CROSBI ID: 363913 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Combining Part-of-Speech Tagger and Inflectional Lexicon for Croatian

Autori
Agić, Željko ; Tadić, Marko ; Dovedan, Zdravko

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of the 6th Language Technologies Conference / Erjavec, Tomaž ; Žganec Gros, Jerneja - Ljubljana : Institut Jožef Stefan, 2008, 116-121

ISBN
978-961-264-006-4

Skup
11th Information Society Multiconference (IS 2008) / 6th Language Technologies Conference (IS-LTC 2008)

Mjesto i datum
Ljubljana, Slovenija, 16.10.2008. - 17.10.2008

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
PoS/MSD tagging; HMM; inflectional lexicon; Croatian language

Sažetak
This paper investigates several methods of combining output of a third-order Hidden Markov Model PoS/MSD tagger and a highcoverage inflectional lexicon for Croatian. Our primary motivation was to improve overall tagging accuracy of Croatian texts by using our newly-developed PoS/MSD tagger. We also wanted to compare its tagging results – both standalone and utilizing the morphological lexicon – to the ones previously described in (Agić, Tadić, 2006), provided by the TnT statistical tagger applied to Croatian which we used as a reference point having in mind that both implement the second-order HMM tagging procedure. At the beginning we explain the basic idea behind the experiment, its motivation and importance from the perspective of processing the Croatian language. We also describe all the tools and language resources used in the experiment, including their operating paradigms and input and output format details that were of importance. With the basics presented, we describe in theory all the possible methods of combining these resources and tools with respect to their paradigm, input and production capabilities and then put these ideas to test, using the de facto standard recall, precision and F-measure framework. Results are then discussed in detail and conclusions and future work plans are presented.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo, Informacijske i komunikacijske znanosti, Filologija

POVEZANOST RADA

Projekti:
036-1300646-1986 - Otkrivanje znanja u tekstnim podacima (Dalbelo-Bašić, Bojana, MZO ) ( CroRIS)
130-1300646-0645 - Hrvatski jezični resursi i njihovo obilježavanje (Tadić, Marko, MZOS ) ( CroRIS)
130-1300646-1776 - Računalna sintaksa hrvatskoga jezika (Dovedan Han, Zdravko, MZOS ) ( CroRIS)

Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb,
Filozofski fakultet, Zagreb

Profili:

Zdravko Dovedan Han (autor)