Pregled bibliografske jedinice broj: 364628
Evaluating Sentence Alignment on Croatian-English Parallel Corpora
Evaluating Sentence Alignment on Croatian-English Parallel Corpora // Proceedings of the 6th International Conference on Formal Approaches to South Slavic and Balkan Languages / Tadić, Marko ; Dimitrova-Vulchanova, Mila ; Koeva, Svetla (ur.).
Zagreb: Hrvatsko društvo za jezične tehnologije, 2008. str. 101-108 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 364628 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Evaluating Sentence Alignment on Croatian-English Parallel Corpora
(Evaluating sentence alignment on Croatian-English parallel corpora)
Autori
Seljan, Sanja ; Agić, Željko ; Tadić, Marko
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Proceedings of the 6th International Conference on Formal Approaches to South Slavic and Balkan Languages
/ Tadić, Marko ; Dimitrova-Vulchanova, Mila ; Koeva, Svetla - Zagreb : Hrvatsko društvo za jezične tehnologije, 2008, 101-108
ISBN
978-953-55375-0-2
Skup
6th International Conference on Formal Approaches to South Slavic and Balkan Languages (FASSBL 2008)
Mjesto i datum
Dubrovnik, Hrvatska, 25.09.2008. - 28.09.2008
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
sentence alignment; croatian-english parallel corpora
Sažetak
This paper describes an experiment in applying sentence alignment methods to Croatian-English parallel corpora and systematically evaluate their performance within the recall, precision and F-measure framework. It is our primary goal to provide an insight and a reference point on sentence alignment accuracy for Croatian-English language pair and also to extend the scope of (Tadić, 2000) – to our knowledge, the first experiment dealing with sentence alignment of Croatian-English parallel corpora – by utilizing newly implemented tools, creating corpora subsets defined by genre and finally by expanding and formalizing its preliminary observations on alignment accuracy. Therefore, in this paper we start off by briefly describing and argumenting sentence alignment paradigms of choice and presenting available language resources, subset of Croatian-English parallel corpus described in (Tadić, 2000) being our primary asset. These descriptions are followed by a formal definition of our testing framework. Results are then discussed in detail and conclusions are stated along with a brief insight on possible future work.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo, Informacijske i komunikacijske znanosti, Filologija
POVEZANOST RADA
Projekti:
130-1300646-0645 - Hrvatski jezični resursi i njihovo obilježavanje (Tadić, Marko, MZOS ) ( CroRIS)
130-1300646-0909 - Informacijska tehnologija u prevođenju hrvatskoga i e-učenju jezika (Seljan, Sanja, MZOS ) ( CroRIS)
130-1300646-1776 - Računalna sintaksa hrvatskoga jezika (Dovedan Han, Zdravko, MZOS ) ( CroRIS)
Ustanove:
Filozofski fakultet, Zagreb