Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 449387

New version of the Croatian National Corpus


Tadić, Marko
New version of the Croatian National Corpus // After Half a Century of Slavonic Natural Language Processing / Hlaváčková, Dana ; Horák, Aleš ; Osolsobě, Klara ; Rychlý, Pavel (ur.).
Brno: Masarykova univerzita, 2009. str. 199-205


CROSBI ID: 449387 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
New version of the Croatian National Corpus

Autori
Tadić, Marko

Vrsta, podvrsta i kategorija rada
Poglavlja u knjigama, znanstveni

Knjiga
After Half a Century of Slavonic Natural Language Processing

Urednik/ci
Hlaváčková, Dana ; Horák, Aleš ; Osolsobě, Klara ; Rychlý, Pavel

Izdavač
Masarykova univerzita

Grad
Brno

Godina
2009

Raspon stranica
199-205

ISBN
978-80-7399-815-8

Ključne riječi
corpus, corpus linguistics, Croatian National Corpus, Croatian language

Sažetak
This contribution presents the new version (v 2.5) of the Croatian National Corpus (HNK). In the beginning it briefly describes the history of collecting HNK and its first two versions. It continues with describing the differences and novelties introduced in this new version: 1) new text samples that bring the existing corpus structure more to the desired ideal ensemble of text types, genres and topics ; 2) lemmatization and full MSD-tagging of the whole corpus. This second update is realized using lemmatizer and MSD-tagger for Croatian described in (Agi`c et al. 2008, Agić et al. 2009a). It achieves results at the level of state-of-art of taggers for other Slavic languages while in lemmatization it offers some novel solutions in its hybrid approach to disambiguation of lemmatization. Lemmatized, MSD-tagged and disambiguated HNK is available for querying through standard client-server architecture Manatee/Bonito. The contribution concludes with future directions for HNK.

Izvorni jezik
Engleski

Znanstvena područja
Filologija



POVEZANOST RADA


Projekti:
130-1300646-0645 - Hrvatski jezični resursi i njihovo obilježavanje (Tadić, Marko, MZOS ) ( CroRIS)

Ustanove:
Filozofski fakultet, Zagreb

Profili:

Avatar Url Marko Tadić (autor)


Citiraj ovu publikaciju:

Tadić, Marko
New version of the Croatian National Corpus // After Half a Century of Slavonic Natural Language Processing / Hlaváčková, Dana ; Horák, Aleš ; Osolsobě, Klara ; Rychlý, Pavel (ur.).
Brno: Masarykova univerzita, 2009. str. 199-205
Tadić, M. (2009) New version of the Croatian National Corpus. U: Hlaváčková, D., Horák, A., Osolsobě, K. & Rychlý, P. (ur.) After Half a Century of Slavonic Natural Language Processing. Brno, Masarykova univerzita, str. 199-205.
@inbook{inbook, author = {Tadi\'{c}, Marko}, year = {2009}, pages = {199-205}, keywords = {corpus, corpus linguistics, Croatian National Corpus, Croatian language}, isbn = {978-80-7399-815-8}, title = {New version of the Croatian National Corpus}, keyword = {corpus, corpus linguistics, Croatian National Corpus, Croatian language}, publisher = {Masarykova univerzita}, publisherplace = {Brno} }
@inbook{inbook, author = {Tadi\'{c}, Marko}, year = {2009}, pages = {199-205}, keywords = {corpus, corpus linguistics, Croatian National Corpus, Croatian language}, isbn = {978-80-7399-815-8}, title = {New version of the Croatian National Corpus}, keyword = {corpus, corpus linguistics, Croatian National Corpus, Croatian language}, publisher = {Masarykova univerzita}, publisherplace = {Brno} }




Contrast
Increase Font
Decrease Font
Dyslexic Font