Priprema usporedivih korpusa za usporedbu

Lalli Paćelat, Ivana

izvor podataka: crosbi !

Priprema usporedivih korpusa za usporedbu (CROSBI ID 639869)

Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija

Lalli Paćelat, Ivana Priprema usporedivih korpusa za usporedbu // Proceedings of the Conference on Language Technologies & Digital Humanities / Erjavec, Tomaž, Fišer, Darja (ur.). Ljubljana: Znanstvena založba Filozofske fakultete Univerze v Ljubljani, 2016. str. 111-120

Podaci o odgovornosti

Autori

Lalli Paćelat, Ivana

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

hrvatski

Naslov

Priprema usporedivih korpusa za usporedbu

Sažetak

Although the six corpora included in the research were comparable with respect to size, purpose and structure, it was indispensable, due to the nature of the planned quantitative analysis, to make them comparable at the POS and MSD tagging level. Since the tagsets used to annotate the corpora were only partially compatible, several procedures were needed to convert the existing tags to a common tagset in order to have comparable results. However, also in case of full compatibility with international standards, it is inevitable to think about and to compare the tagsets because of the differences in the perception and in the existence of grammatical categories in different languages, i.e. Croatian and Italian. After the differences among the tagsets of the six corpora were identified, followed by a detailed contrastive analysis of the two languages and after the only possible common POS and MSD tagset was found, the normalization of the corpora was performed. In order to achieve better comparability of results at inter-lingual level only the distribution within the common, comparable and relevant tags were taken into account which contributed to greater reliability and accuracy of results. On one hand this paper confirmed the importance of systematic planning of linguistic annotation scheme for each language in accordance with guidelines which prescribe international standards and create conditions for the comparability across corpora at both inter-lingual and intra-lingual levels. On the other hand, the paper showed that comparing and analysing MSD or POS tagsets can be considered a good basis and an interesting approach for the contrastive analysis.

Ključne riječi

usporedivi korpusi; kontrastivna analiza; POS označavanje; korpusna lingvistika

Napomena

nije evidentirano

Jezik

engleski

Naslov

Preparing comparable corpora for comparison

Sažetak

nije evidentirano

Ključne riječi

comparable corpora; corpus-based contrastive analysis; POS tagging; corpus linguistics

Napomena

nije evidentirano

Podaci o prilogu

Stranice rada

111-120.

Godina izdavanja

2016.

Status objave rada

objavljeno

Podaci o matičnoj publikaciji

Naslov

Proceedings of the Conference on Language Technologies & Digital Humanities

Urednici

Erjavec, Tomaž, Fišer, Darja

Izdavač

Ljubljana: Znanstvena založba Filozofske fakultete Univerze v Ljubljani

ISBN

978-961-237-862-2

Podaci o skupu

Skup

Language Technologies & Digital Humanities

Vrsta sudjelovanja

predavanje

Datum održavanja skupa

29.09.2016-01.10.2016

Mjesto održavanja skupa

Ljubljana, Slovenija

Povezanost rada

Povezane osobe

Ivana Lalli Paćelat (autor/i)

Povezane ustanove

Sveučilište Jurja Dobrile u Puli (303) (autorova ustanova)

Područje

Filologija

Poveznice

nl.ijs.si