Napredna pretraga

Pregled bibliografske jedinice broj: 364628

Evaluating Sentence Alignment on Croatian-English Parallel Corpora


Seljan, Sanja; Agić, Željko; Tadić, Marko
Evaluating Sentence Alignment on Croatian-English Parallel Corpora // Proceedings of the 6th International Conference on Formal Approaches to South Slavic and Balkan Languages / Tadić, Marko ; Dimitrova-Vulchanova, Mila ; Koeva, Svetla (ur.).
Zagreb: Croatian Language Technologies Society, 2008. str. 101-108 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)


Naslov
Evaluating Sentence Alignment on Croatian-English Parallel Corpora
(Evaluating sentence alignment on Croatian-English parallel corpora)

Autori
Seljan, Sanja ; Agić, Željko ; Tadić, Marko

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of the 6th International Conference on Formal Approaches to South Slavic and Balkan Languages / Tadić, Marko ; Dimitrova-Vulchanova, Mila ; Koeva, Svetla - Zagreb : Croatian Language Technologies Society, 2008, 101-108

ISBN
978-953-55375-0-2

Skup
6th International Conference on Formal Approaches to South Slavic and Balkan Languages (FASSBL 2008)

Mjesto i datum
Dubrovnik, Hrvatska, 25-28.09.2008.

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
Sentence alignment; croatian-english parallel corpora

Sažetak
This paper describes an experiment in applying sentence alignment methods to Croatian-English parallel corpora and systematically evaluate their performance within the recall, precision and F-measure framework. It is our primary goal to provide an insight and a reference point on sentence alignment accuracy for Croatian-English language pair and also to extend the scope of (Tadić, 2000) – to our knowledge, the first experiment dealing with sentence alignment of Croatian-English parallel corpora – by utilizing newly implemented tools, creating corpora subsets defined by genre and finally by expanding and formalizing its preliminary observations on alignment accuracy. Therefore, in this paper we start off by briefly describing and argumenting sentence alignment paradigms of choice and presenting available language resources, subset of Croatian-English parallel corpus described in (Tadić, 2000) being our primary asset. These descriptions are followed by a formal definition of our testing framework. Results are then discussed in detail and conclusions are stated along with a brief insight on possible future work.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo, Informacijske i komunikacijske znanosti, Filologija



POVEZANOST RADA


Projekt / tema
130-1300646-0645 - Hrvatski jezični resursi i njihovo obilježavanje (Marko Tadić, )
130-1300646-0909 - Informacijska tehnologija u prevođenju hrvatskoga i e-učenju jezika (Sanja Seljan, )
130-1300646-1776 - Računalna sintaksa hrvatskoga jezika (Zdravko Dovedan Han, )

Ustanove
Filozofski fakultet, Zagreb