Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 342299

Rule Based Chunker for Croatian


Vučković, Kristina; Tadić, Marko; Dovedan, Zdravko
Rule Based Chunker for Croatian // Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC2008) / Calzolari, Nicoletta ; Choukri, Khalid ; Maegaard, Bente ; Mariani, Joseph ; Odjik, Jan ; Piperidis, Stelios ; Tapias, Daniel (ur.).
Marakeš: European Language Resources Association (ELRA), 2008. str. 2544-2549


CROSBI ID: 342299 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Rule Based Chunker for Croatian

Autori
Vučković, Kristina ; Tadić, Marko ; Dovedan, Zdravko

Vrsta, podvrsta i kategorija rada
Poglavlja u knjigama, znanstveni

Knjiga
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC2008)

Urednik/ci
Calzolari, Nicoletta ; Choukri, Khalid ; Maegaard, Bente ; Mariani, Joseph ; Odjik, Jan ; Piperidis, Stelios ; Tapias, Daniel

Izdavač
European Language Resources Association (ELRA)

Grad
Marakeš

Godina
2008

Raspon stranica
2544-2549

ISBN
2-9517408-4-0

Ključne riječi
chunker, rule based, local regular grammar, Croatian

Sažetak
In this paper we discuss a rule-based approach to chunking sentences in Croatian, implemented using local regular grammars within the NooJ development environment. We describe the rules and their implementation by regular grammars and at the same time show that in NooJ environment it is extremely easy to fine tune their different sub-rules. Since Croatian has strong morphosyntactic features that are shared between most or all elements of a chunk, the rules are built by taking these features into account and strongly relying on them. For the evaluation of our chunker we used a extracted set of manually annotated sentences from 100 kw MSD/tagged and disambiguated Croatian corpus. Our chunker performed the best on VP- chunks (F: 97.01), while NP-chunks (F: 92.31) and PP-chunks (F: 83.08) were of lower quality. The results are comparable to chunker performance of CoNLL-2000 shared task of chunking.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti, Filologija



POVEZANOST RADA


Projekti:
036-1300646-1986 - Otkrivanje znanja u tekstnim podacima (Dalbelo-Bašić, Bojana, MZO ) ( CroRIS)
130-1300646-0645 - Hrvatski jezični resursi i njihovo obilježavanje (Tadić, Marko, MZOS ) ( CroRIS)
130-1300646-1776 - Računalna sintaksa hrvatskoga jezika (Dovedan Han, Zdravko, MZOS ) ( CroRIS)

Ustanove:
Filozofski fakultet, Zagreb

Profili:

Avatar Url Kristina Kocijan (autor)

Avatar Url Marko Tadić (autor)

Avatar Url Zdravko Dovedan Han (autor)

Poveznice na cjeloviti tekst rada:

Pristup cjelovitom tekstu rada www.lrec-conf.org

Citiraj ovu publikaciju:

Vučković, Kristina; Tadić, Marko; Dovedan, Zdravko
Rule Based Chunker for Croatian // Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC2008) / Calzolari, Nicoletta ; Choukri, Khalid ; Maegaard, Bente ; Mariani, Joseph ; Odjik, Jan ; Piperidis, Stelios ; Tapias, Daniel (ur.).
Marakeš: European Language Resources Association (ELRA), 2008. str. 2544-2549
Vučković, K., Tadić, M. & Dovedan, Z. (2008) Rule Based Chunker for Croatian. U: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odjik, J., Piperidis, S. & Tapias, D. (ur.) Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC2008). Marakeš, European Language Resources Association (ELRA), str. 2544-2549.
@inbook{inbook, author = {Vu\v{c}kovi\'{c}, Kristina and Tadi\'{c}, Marko and Dovedan, Zdravko}, year = {2008}, pages = {2544-2549}, keywords = {chunker, rule based, local regular grammar, Croatian}, isbn = {2-9517408-4-0}, title = {Rule Based Chunker for Croatian}, keyword = {chunker, rule based, local regular grammar, Croatian}, publisher = {European Language Resources Association (ELRA)}, publisherplace = {Marake\v{s}} }
@inbook{inbook, author = {Vu\v{c}kovi\'{c}, Kristina and Tadi\'{c}, Marko and Dovedan, Zdravko}, year = {2008}, pages = {2544-2549}, keywords = {chunker, rule based, local regular grammar, Croatian}, isbn = {2-9517408-4-0}, title = {Rule Based Chunker for Croatian}, keyword = {chunker, rule based, local regular grammar, Croatian}, publisher = {European Language Resources Association (ELRA)}, publisherplace = {Marake\v{s}} }




Contrast
Increase Font
Decrease Font
Dyslexic Font