Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 516695

Towards Obtaining High Quality Sentence-Aligned English-Croatian Parallel Corpus


Brkić, Marija; Matetić, Maja; Seljan, Sanja
Towards Obtaining High Quality Sentence-Aligned English-Croatian Parallel Corpus // Proceedings of the 4th IEEE International Conference on Computer Science and Information Technology ICCSIT 2011
Chengdu, 2011. str. 1068-1070 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)


CROSBI ID: 516695 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Towards Obtaining High Quality Sentence-Aligned English-Croatian Parallel Corpus

Autori
Brkić, Marija ; Matetić, Maja ; Seljan, Sanja

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of the 4th IEEE International Conference on Computer Science and Information Technology ICCSIT 2011 / - Chengdu, 2011, 1068-1070

Skup
4th IEEE International Conference on Computer Science and Information Technology ICCSIT 2011

Mjesto i datum
Sichuan, Kina, 10.06.2011. - 12.06.2011

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
Sentence alignment ; alignment tools ; sentence alignment evaluation ; parallel corpus ; sentence-length ; word-correspondence

Sažetak
This paper presents the acquisition of parallel bilingual corpus and all the steps involved in the process of unsupervised sentence alignment, such as tokenization, lowercasing, etc. The problem of sentence alignment is not trivial because translators do not necessarily translate one sentence in the source language into one sentence in the target language. Three different unsupervised and language independent approaches to sentence alignment are presented and implementations of these approaches through three different freely available tools are tested. A gold standard for English-Croatian automatic sentence alignment evaluation is created. Finally, a detailed analysis of the acquired corpus is given.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti



POVEZANOST RADA


Projekti:
130-1300646-0909 - Informacijska tehnologija u prevođenju hrvatskoga i e-učenju jezika (Seljan, Sanja, MZOS ) ( CroRIS)
318-0361935-0852 - Govorne tehnologije (Ipšić, Ivo, MZOS ) ( CroRIS)

Ustanove:
Filozofski fakultet, Zagreb,
Fakultet informatike i digitalnih tehnologija, Rijeka

Profili:

Avatar Url Sanja Seljan (autor)

Avatar Url Maja Matetić (autor)

Avatar Url Marija Brkić Bakarić (autor)

Poveznice na cjeloviti tekst rada:

Pristup cjelovitom tekstu rada

Citiraj ovu publikaciju:

Brkić, Marija; Matetić, Maja; Seljan, Sanja
Towards Obtaining High Quality Sentence-Aligned English-Croatian Parallel Corpus // Proceedings of the 4th IEEE International Conference on Computer Science and Information Technology ICCSIT 2011
Chengdu, 2011. str. 1068-1070 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
Brkić, M., Matetić, M. & Seljan, S. (2011) Towards Obtaining High Quality Sentence-Aligned English-Croatian Parallel Corpus. U: Proceedings of the 4th IEEE International Conference on Computer Science and Information Technology ICCSIT 2011.
@article{article, author = {Brki\'{c}, Marija and Mateti\'{c}, Maja and Seljan, Sanja}, year = {2011}, pages = {1068-1070}, keywords = {Sentence alignment, alignment tools, sentence alignment evaluation, parallel corpus, sentence-length, word-correspondence}, title = {Towards Obtaining High Quality Sentence-Aligned English-Croatian Parallel Corpus}, keyword = {Sentence alignment, alignment tools, sentence alignment evaluation, parallel corpus, sentence-length, word-correspondence}, publisherplace = {Sichuan, Kina} }
@article{article, author = {Brki\'{c}, Marija and Mateti\'{c}, Maja and Seljan, Sanja}, year = {2011}, pages = {1068-1070}, keywords = {Sentence alignment, alignment tools, sentence alignment evaluation, parallel corpus, sentence-length, word-correspondence}, title = {Towards Obtaining High Quality Sentence-Aligned English-Croatian Parallel Corpus}, keyword = {Sentence alignment, alignment tools, sentence alignment evaluation, parallel corpus, sentence-length, word-correspondence}, publisherplace = {Sichuan, Kina} }




Contrast
Increase Font
Decrease Font
Dyslexic Font