The SETimes.HR Linguistically Annotated Corpus of Croatian

Agić, Željko; Ljubešić, Nikola

izvor podataka: crosbi !

The SETimes.HR Linguistically Annotated Corpus of Croatian (CROSBI ID 610829)

Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija

Agić, Željko ; Ljubešić, Nikola The SETimes.HR Linguistically Annotated Corpus of Croatian // Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014) / Calzolari, Nicoletta ; Choukri, Khalid ; Declerck, Thierry et al. (ur.). Reykjavík: European Language Resources Association (ELRA), 2014. str. 1724-1727

Podaci o odgovornosti

Autori

Agić, Željko ; Ljubešić, Nikola

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

The SETimes.HR Linguistically Annotated Corpus of Croatian

Sažetak

We present SETIMES.HR— the ﬁrst linguistically annotated corpus of Croatian that is freely available for all purposes. The corpus is built on top of the SETIMES parallel corpus of nine Southeast European languages and English. It is manually annotated for lemmas, morphosyntactic tags, named entities and dependency syntax. We couple the corpus with domain-sensitive test sets for Croatian and Serbian to support direct model transfer evaluation between these closely related languages. We build and evaluate statistical models for lemmatization, morphosyntactic tagging, named entity recognition and dependency parsing on top of SETIMES.HR and the test sets, providing the state of the art in all the tasks. We make all resources presented in the paper freely available under a very permissive licensing scheme.

Ključne riječi

dependency treebank; Croatian language; free availability

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o prilogu

Stranice rada

1724-1727.

Godina izdavanja

2014.

Status objave rada

objavljeno

Podaci o matičnoj publikaciji

Naslov

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)

Urednici

Calzolari, Nicoletta ; Choukri, Khalid ; Declerck, Thierry ; Loftsson, Hrafn ; Maegaard, Bente ; Mariani, Joseph ; Moreno, Asuncion ; Odijk, Jan ; Piperidis, Stelios

Izdavač

Reykjavík: European Language Resources Association (ELRA)

ISBN

978-2-9517408-8-4

Podaci o skupu

Skup

Ninth International Conference on Language Resources and Evaluation (LREC 2014)

Vrsta sudjelovanja

poster

Datum održavanja skupa

26.05.2014-31.05.2014

Mjesto održavanja skupa

Reykjavík, Island

Povezanost rada

Povezane osobe

Nikola Ljubešić (autor/i)

Željko Agić (autor/i)

Povezane ustanove

Filozofski fakultet u Zagrebu (130) (autorova ustanova)

Povezani projekti

Računalna sintaksa hrvatskoga jezika (rezultat rada na projektu)

Područje

Informacijske i komunikacijske znanosti

Poveznice

lrec-conf.org