Napredna pretraga

Pregled bibliografske jedinice broj: 930470

Merging Comparable Data Sources for the Discrimination of Similar Languages: The DSL Corpus Collection


Tan, Liling; Zampieri. Marcos; Ljubešić, Nikola; Tiedemann, Jörg
Merging Comparable Data Sources for the Discrimination of Similar Languages: The DSL Corpus Collection // Proceedings of the 7th Workshop on Building and Using Comparable Corpora (BUCC)
Reykjavik, Island: ELRA, 2014. str. 20-24 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)


Naslov
Merging Comparable Data Sources for the Discrimination of Similar Languages: The DSL Corpus Collection

Autori
Tan, Liling ; Zampieri. Marcos ; Ljubešić, Nikola ; Tiedemann, Jörg

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of the 7th Workshop on Building and Using Comparable Corpora (BUCC) / - Reykjavik, Island : ELRA, 2014, 20-24

Skup
7th Workshop on Building and Using Comparable Corpora (BUCC)

Mjesto i datum
Reykjavik, Island, 27.05.2014.

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
Comparable corpora, similar languages, language discrimination

Sažetak
This paper presents the compilation of the DSL corpus collection created for the DSL (Discriminating Similar Languages) shared task to be held at the VarDial workshop at COLING 2014. The DSL corpus collection were merged from three comparable corpora to provide a suitable dataset for automatic classification to discriminate similar languages and language varieties. Along with the description of the DSL corpus collection we also present results of baseline discrimination experiments reporting performance of up to 87.4% accuracy.

Izvorni jezik
Engleski



POVEZANOST RADA


Ustanove
Filozofski fakultet, Zagreb

Autor s matičnim brojem:
Nikola Ljubešić, (272820)

Časopis indeksira:


  • Scopus