Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 713076

Comparing Two Acquisition Systems for Automatically Building an English-Croatian Parallel Corpus from Multilingual Websites


Espla-Gomis, Miquel; Klubička, Filip; Ljubešić, Nikola; Ortiz-Rojas, Sergio; Papavassiliou, Vassilis; Prokopidis, Prokopis
Comparing Two Acquisition Systems for Automatically Building an English-Croatian Parallel Corpus from Multilingual Websites // Language Resources and Evaluation Conference 2014
Reykjavík, Island, 2014. (poster, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)


CROSBI ID: 713076 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Comparing Two Acquisition Systems for Automatically Building an English-Croatian Parallel Corpus from Multilingual Websites

Autori
Espla-Gomis, Miquel ; Klubička, Filip ; Ljubešić, Nikola ; Ortiz-Rojas, Sergio ; Papavassiliou, Vassilis ; Prokopidis, Prokopis

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Skup
Language Resources and Evaluation Conference 2014

Mjesto i datum
Reykjavík, Island, 26.05.2014. - 31.05.2014

Vrsta sudjelovanja
Poster

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
parallel data acquisition; focused crawling; system comparison

Sažetak
In this paper we compare two tools for automatically harvesting bitexts from multilingual websites: bitextor and ILSP-FC. We used both tools for crawling 21 multilingual websites from the tourism domain to build a domain-specific English―Croatian parallel corpus. Different settings were tried for both tools and 10, 662 unique document pairs were obtained. A sample of about 10% of them was manually examined and the success rate was computed on the collection of pairs of documents detected by each setting. We compare the performance of the settings and the amount of different corpora detected by each setting. In addition, we describe the resource obtained, both by the settings and through the human evaluation, which has been released as a high-quality parallel corpus.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti



POVEZANOST RADA


Ustanove:
Filozofski fakultet, Zagreb

Profili:

Avatar Url Nikola Ljubešić (autor)


Citiraj ovu publikaciju:

Espla-Gomis, Miquel; Klubička, Filip; Ljubešić, Nikola; Ortiz-Rojas, Sergio; Papavassiliou, Vassilis; Prokopidis, Prokopis
Comparing Two Acquisition Systems for Automatically Building an English-Croatian Parallel Corpus from Multilingual Websites // Language Resources and Evaluation Conference 2014
Reykjavík, Island, 2014. (poster, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
Espla-Gomis, M., Klubička, F., Ljubešić, N., Ortiz-Rojas, S., Papavassiliou, V. & Prokopidis, P. (2014) Comparing Two Acquisition Systems for Automatically Building an English-Croatian Parallel Corpus from Multilingual Websites. U: Language Resources and Evaluation Conference 2014.
@article{article, author = {Espla-Gomis, Miquel and Klubi\v{c}ka, Filip and Ljube\v{s}i\'{c}, Nikola and Ortiz-Rojas, Sergio and Papavassiliou, Vassilis and Prokopidis, Prokopis}, year = {2014}, keywords = {parallel data acquisition, focused crawling, system comparison}, title = {Comparing Two Acquisition Systems for Automatically Building an English-Croatian Parallel Corpus from Multilingual Websites}, keyword = {parallel data acquisition, focused crawling, system comparison}, publisherplace = {Reykjav\'{\i}k, Island} }
@article{article, author = {Espla-Gomis, Miquel and Klubi\v{c}ka, Filip and Ljube\v{s}i\'{c}, Nikola and Ortiz-Rojas, Sergio and Papavassiliou, Vassilis and Prokopidis, Prokopis}, year = {2014}, keywords = {parallel data acquisition, focused crawling, system comparison}, title = {Comparing Two Acquisition Systems for Automatically Building an English-Croatian Parallel Corpus from Multilingual Websites}, keyword = {parallel data acquisition, focused crawling, system comparison}, publisherplace = {Reykjav\'{\i}k, Island} }




Contrast
Increase Font
Decrease Font
Dyslexic Font