Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi !

Improving Chunking Accuracy on Croatian Texts by Morphosyntactic Tagging (CROSBI ID 560415)

Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija

Vučković, Kristina ; Agić, Željko ; Tadić, Marko Improving Chunking Accuracy on Croatian Texts by Morphosyntactic Tagging // Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010) / Calzolari, Nicoletta ; Choukri, Khalid ; Maegaard, Bente et al. (ur.). Valletta: European Language Resources Association (ELRA), 2010. str. 1944-1949

Podaci o odgovornosti

Vučković, Kristina ; Agić, Željko ; Tadić, Marko

engleski

Improving Chunking Accuracy on Croatian Texts by Morphosyntactic Tagging

In this paper, we present the results of an experiment with utilizing a stochastic morphosyntactic tagger as a pre-processing module of a rule-based chunker and partial parser for Croatian in order to raise its overall chunking and partial parsing accuracy on Croatian texts. In order to conduct the experiment, we have manually chunked and partially parsed 459 sentences from the Croatia Weekly 100 kw newspaper sub-corpus taken from the Croatian National Corpus, that were previously also morphosyntactically disambiguated and lemmatized. Due to the lack of resources of this type, these sentences were designated as a temporary chunking and partial parsing gold standard for Croatian. We have then evaluated the chunker and partial parser in three different scenarios: (1) chunking previously morphosyntactically untagged text, (2) chunking text that was tagged using the stochastic morphosyntactic tagger for Croatian and (3) chunking manually tagged text. The obtained F1- scores for the three scenarios were, respectively, 0.875 (P: 0.826, R: 0.930), 0.900 (P: 0.866, R: 0.937) and 0.930 (P: 0.912, R: 0.949). The paper provides the description of language resources and tools used in the experiment, its setup and discussion of results and perspectives for future work.

chunking; partial parsing; morphosyntactic tagging

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o prilogu

1944-1949.

2010.

nije evidentirano

objavljeno

2-9517408-6-7

Podaci o matičnoj publikaciji

Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010)

Calzolari, Nicoletta ; Choukri, Khalid ; Maegaard, Bente ; Mariani, Joseph ; Odjik, Jan ; Piperidis, Stelios ; Rosner, Mike ; Tapias, Daniel

Valletta: European Language Resources Association (ELRA)

Podaci o skupu

Seventh International Conference on Language Resources and Evaluation (LREC 2010)

poster

17.05.2010-23.05.2010

Valletta, Malta

Povezanost rada

Informacijske i komunikacijske znanosti, Filologija

Poveznice