Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 837276

Priprema usporedivih korpusa za usporedbu


Lalli Paćelat, Ivana
Priprema usporedivih korpusa za usporedbu // Proceedings of the Conference on Language Technologies & Digital Humanities / Erjavec, Tomaž, Fišer, Darja (ur.).
Ljubljana: Znanstvena založba Filozofske fakultete Univerze v Ljubljani, 2016. str. 111-120 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)


CROSBI ID: 837276 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Priprema usporedivih korpusa za usporedbu
(Preparing comparable corpora for comparison)

Autori
Lalli Paćelat, Ivana

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of the Conference on Language Technologies & Digital Humanities / Erjavec, Tomaž, Fišer, Darja - Ljubljana : Znanstvena založba Filozofske fakultete Univerze v Ljubljani, 2016, 111-120

ISBN
978-961-237-862-2

Skup
Language Technologies & Digital Humanities

Mjesto i datum
Ljubljana, Slovenija, 29.09.2016. - 01.10.2016

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
usporedivi korpusi; kontrastivna analiza; POS označavanje; korpusna lingvistika
(comparable corpora; corpus-based contrastive analysis; POS tagging; corpus linguistics)

Sažetak
Although the six corpora included in the research were comparable with respect to size, purpose and structure, it was indispensable, due to the nature of the planned quantitative analysis, to make them comparable at the POS and MSD tagging level. Since the tagsets used to annotate the corpora were only partially compatible, several procedures were needed to convert the existing tags to a common tagset in order to have comparable results. However, also in case of full compatibility with international standards, it is inevitable to think about and to compare the tagsets because of the differences in the perception and in the existence of grammatical categories in different languages, i.e. Croatian and Italian. After the differences among the tagsets of the six corpora were identified, followed by a detailed contrastive analysis of the two languages and after the only possible common POS and MSD tagset was found, the normalization of the corpora was performed. In order to achieve better comparability of results at inter-lingual level only the distribution within the common, comparable and relevant tags were taken into account which contributed to greater reliability and accuracy of results. On one hand this paper confirmed the importance of systematic planning of linguistic annotation scheme for each language in accordance with guidelines which prescribe international standards and create conditions for the comparability across corpora at both inter-lingual and intra-lingual levels. On the other hand, the paper showed that comparing and analysing MSD or POS tagsets can be considered a good basis and an interesting approach for the contrastive analysis.

Izvorni jezik
Hrvatski

Znanstvena područja
Filologija



POVEZANOST RADA


Ustanove:
Sveučilište Jurja Dobrile u Puli

Profili:

Avatar Url Ivana Lalli-Paćelat (autor)

Poveznice na cjeloviti tekst rada:

Pristup cjelovitom tekstu rada nl.ijs.si nl.ijs.si

Citiraj ovu publikaciju:

Lalli Paćelat, Ivana
Priprema usporedivih korpusa za usporedbu // Proceedings of the Conference on Language Technologies & Digital Humanities / Erjavec, Tomaž, Fišer, Darja (ur.).
Ljubljana: Znanstvena založba Filozofske fakultete Univerze v Ljubljani, 2016. str. 111-120 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
Lalli Paćelat, I. (2016) Priprema usporedivih korpusa za usporedbu. U: Erjavec, Tomaž, Fišer, Darja (ur.)Proceedings of the Conference on Language Technologies & Digital Humanities.
@article{article, author = {Lalli Pa\'{c}elat, Ivana}, year = {2016}, pages = {111-120}, keywords = {usporedivi korpusi, kontrastivna analiza, POS ozna\v{c}avanje, korpusna lingvistika}, isbn = {978-961-237-862-2}, title = {Priprema usporedivih korpusa za usporedbu}, keyword = {usporedivi korpusi, kontrastivna analiza, POS ozna\v{c}avanje, korpusna lingvistika}, publisher = {Znanstvena zalo\v{z}ba Filozofske fakultete Univerze v Ljubljani}, publisherplace = {Ljubljana, Slovenija} }
@article{article, author = {Lalli Pa\'{c}elat, Ivana}, year = {2016}, pages = {111-120}, keywords = {comparable corpora, corpus-based contrastive analysis, POS tagging, corpus linguistics}, isbn = {978-961-237-862-2}, title = {Preparing comparable corpora for comparison}, keyword = {comparable corpora, corpus-based contrastive analysis, POS tagging, corpus linguistics}, publisher = {Znanstvena zalo\v{z}ba Filozofske fakultete Univerze v Ljubljani}, publisherplace = {Ljubljana, Slovenija} }




Contrast
Increase Font
Decrease Font
Dyslexic Font