Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 698032

The SETimes.HR Linguistically Annotated Corpus of Croatian


Agić, Željko; Ljubešić, Nikola
The SETimes.HR Linguistically Annotated Corpus of Croatian // Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014) / Calzolari, Nicoletta ; Choukri, Khalid ; Declerck, Thierry ; Loftsson, Hrafn ; Maegaard, Bente ; Mariani, Joseph ; Moreno, Asuncion ; Odijk, Jan ; Piperidis, Stelios (ur.).
Reykjavík: European Language Resources Association (ELRA), 2014. str. 1724-1727 (poster, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)


CROSBI ID: 698032 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
The SETimes.HR Linguistically Annotated Corpus of Croatian

Autori
Agić, Željko ; Ljubešić, Nikola

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014) / Calzolari, Nicoletta ; Choukri, Khalid ; Declerck, Thierry ; Loftsson, Hrafn ; Maegaard, Bente ; Mariani, Joseph ; Moreno, Asuncion ; Odijk, Jan ; Piperidis, Stelios - Reykjavík : European Language Resources Association (ELRA), 2014, 1724-1727

ISBN
978-2-9517408-8-4

Skup
Ninth International Conference on Language Resources and Evaluation (LREC 2014)

Mjesto i datum
Reykjavík, Island, 26.05.2014. - 31.05.2014

Vrsta sudjelovanja
Poster

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
dependency treebank; Croatian language; free availability

Sažetak
We present SETIMES.HR— the first linguistically annotated corpus of Croatian that is freely available for all purposes. The corpus is built on top of the SETIMES parallel corpus of nine Southeast European languages and English. It is manually annotated for lemmas, morphosyntactic tags, named entities and dependency syntax. We couple the corpus with domain-sensitive test sets for Croatian and Serbian to support direct model transfer evaluation between these closely related languages. We build and evaluate statistical models for lemmatization, morphosyntactic tagging, named entity recognition and dependency parsing on top of SETIMES.HR and the test sets, providing the state of the art in all the tasks. We make all resources presented in the paper freely available under a very permissive licensing scheme.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti



POVEZANOST RADA


Projekti:
130-1300646-1776 - Računalna sintaksa hrvatskoga jezika (Dovedan Han, Zdravko, MZOS ) ( CroRIS)

Ustanove:
Filozofski fakultet, Zagreb

Profili:

Avatar Url Željko Agić (autor)

Avatar Url Nikola Ljubešić (autor)

Poveznice na cjeloviti tekst rada:

Pristup cjelovitom tekstu rada www.lrec-conf.org

Citiraj ovu publikaciju:

Agić, Željko; Ljubešić, Nikola
The SETimes.HR Linguistically Annotated Corpus of Croatian // Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014) / Calzolari, Nicoletta ; Choukri, Khalid ; Declerck, Thierry ; Loftsson, Hrafn ; Maegaard, Bente ; Mariani, Joseph ; Moreno, Asuncion ; Odijk, Jan ; Piperidis, Stelios (ur.).
Reykjavík: European Language Resources Association (ELRA), 2014. str. 1724-1727 (poster, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
Agić, Ž. & Ljubešić, N. (2014) The SETimes.HR Linguistically Annotated Corpus of Croatian. U: Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J. & Piperidis, S. (ur.)Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014).
@article{article, author = {Agi\'{c}, \v{Z}eljko and Ljube\v{s}i\'{c}, Nikola}, year = {2014}, pages = {1724-1727}, keywords = {dependency treebank, Croatian language, free availability}, isbn = {978-2-9517408-8-4}, title = {The SETimes.HR Linguistically Annotated Corpus of Croatian}, keyword = {dependency treebank, Croatian language, free availability}, publisher = {European Language Resources Association (ELRA)}, publisherplace = {Reykjav\'{\i}k, Island} }
@article{article, author = {Agi\'{c}, \v{Z}eljko and Ljube\v{s}i\'{c}, Nikola}, year = {2014}, pages = {1724-1727}, keywords = {dependency treebank, Croatian language, free availability}, isbn = {978-2-9517408-8-4}, title = {The SETimes.HR Linguistically Annotated Corpus of Croatian}, keyword = {dependency treebank, Croatian language, free availability}, publisher = {European Language Resources Association (ELRA)}, publisherplace = {Reykjav\'{\i}k, Island} }




Contrast
Increase Font
Decrease Font
Dyslexic Font