Pregled bibliografske jedinice broj: 930470
Merging Comparable Data Sources for the Discrimination of Similar Languages: The DSL Corpus Collection
Merging Comparable Data Sources for the Discrimination of Similar Languages: The DSL Corpus Collection // Proceedings of the 7th Workshop on Building and Using Comparable Corpora (BUCC)
Reykjavík: European Language Resources Association (ELRA), 2014. str. 20-24 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 930470 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Merging Comparable Data Sources for the Discrimination of Similar Languages: The DSL Corpus Collection
Autori
Tan, Liling ; Zampieri. Marcos ; Ljubešić, Nikola ; Tiedemann, Jörg
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Proceedings of the 7th Workshop on Building and Using Comparable Corpora (BUCC)
/ - Reykjavík : European Language Resources Association (ELRA), 2014, 20-24
Skup
7th Workshop on Building and Using Comparable Corpora (BUCC)
Mjesto i datum
Reykjavík, Island, 27.05.2014
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
comparable corpora, similar languages, language discrimination
Sažetak
This paper presents the compilation of the DSL corpus collection created for the DSL (Discriminating Similar Languages) shared task to be held at the VarDial workshop at COLING 2014. The DSL corpus collection were merged from three comparable corpora to provide a suitable dataset for automatic classification to discriminate similar languages and language varieties. Along with the description of the DSL corpus collection we also present results of baseline discrimination experiments reporting performance of up to 87.4% accuracy.
Izvorni jezik
Engleski
Citiraj ovu publikaciju:
Časopis indeksira:
- Scopus