Third Workshop on NLP for Similar Languages, Varieties and Dialects

Third Workshop on NLP for Similar Languages, Varieties and Dialects, Osaka: Association for Computational Linguistics, 2016 (zbornik)

Nakov, Preslav ; Zampieri, Marcos ; Tan, Liling ; Ljubešić, Nikola ; Tiedemann, Jörg ; Ali, Ahmed

natural language processing; similar languages; varieties; dialects

VarDial is a well-established series of workshops, attracting researchers working on a range of topics related to the study of linguistic variation, e.g., on building language resources for language varieties and dialects or in creating language technology and applications that make use of language closeness and exploit existing resources in a related language or a language variant. The research presented in the two previous editions, namely VarDial’2014, which was co-located with COLING’2014, and LT4VarDial’2015, which was held together with RANLP’2015, focused on topics such as machine translation between closely related languages, adaptation of POS taggers and parsers for similar languages and language varieties, compilation of corpora for language varieties, spelling normalization, and finally discrimination between and identification of similar languages. The latter was also the topic of the DSL shared task, held in conjunction with the workshop. We believe that this is a very timely series of workshops, as research in language variation is much needed in today’s multi- lingual world, where several closely-related languages, language varieties, and dialects are in daily use, not only as spoken colloquial language but also in written media, e.g., in SMS, chats, and social networks. Language resources for these varieties and dialects are sparse and extending them could be very labor-intensive. Yet, these efforts can often be reduced by making use of pre-existing resources and tools for related, resource-richer languages. Examples of closely-related language varieties include the different variants of Spanish in Latin America, the Arabic dialects in North Africa and the Middle East, German in Germany, Austria and Switzerland, French in France and in Belgium, etc. Examples of pairs of related languages include Swedish- Norwegian, Bulgarian-Macedonian, Serbian-Bosnian, Spanish- Catalan, Russian-Ukrainian, Irish- Gaelic Scottish, Malay- Indonesian, Turkish–Azerbaijani, Mandarin-Cantonese, Hindi– Urdu, etc. This great interest of the community has made possible the third edition of the Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial’2016), co-located with COLING’2016. As part of the workshop, we organized the third edition of the Discriminating between Similar Languages (DSL) shared task, which offered an opportunity for researchers and developers to investigate the performance of computational methods to distinguishing between closely-related languages and language varieties, thus bridging an important gap for language identification. For the first time, the DSL task was divided into two sub-tasks: Sub- task 1 focusing on similar languages and language varieties, and Sub-task 2 on Arabic dialect identification. The third edition of the DSL shared task received a very positive response from the community and a record number of participants. A total of 37 teams subscribed to participate in the DSL shared task, 24 of them submitted official runs, and 20 of the latter also wrote system description papers, which appear in this volume along with a shared task report by the task organizers. These numbers represent a substantial increase in participation compared to the 2014 and 2015 editions of the DSL task. We further received 13 regular VarDial workshop papers, and we selected nine of them to be presented at the workshop and to appear in this volume. Given the aforementioned numbers, we consider the workshop a success, and thus we are organizing a fourth edition in 2017, which will be co-located with EACL’2017. We take the opportunity to thank the VarDial program committee and the additional reviewers for their thorough reviews, and the DSL Shared Task participants, as well as the participants with regular research papers, for the valuable feedback and discussions. We further thank our invited speakers, Mona Diab and Robert Östling, for presenting their interesting work at the workshop. The organizers: Preslav Nakov, Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann, and Shervin Malmasi

Informacijske i komunikacijske znanosti


Filozofski fakultet, Zagreb


