Pregled bibliografske jedinice broj: 1063162
Natural Language Processing Chains Inside a Cross-lingual Event-Centric Knowledge Pipeline for European Union Under-resourced Languages
Natural Language Processing Chains Inside a Cross-lingual Event-Centric Knowledge Pipeline for European Union Under-resourced Languages // Proceedings of the LREC 2020 1st Joint SLTU and CCURL Workshop (SLTU-CCURL 2020) / Beermann, Dorothee ; Besacier, Laurent ; Sakti, Sakriani ; Soria, Claudia (ur.).
Marseille: European Language Resources Association (ELRA), 2020. str. 153-158 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 1063162 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Natural Language Processing Chains Inside a
Cross-lingual Event-Centric Knowledge Pipeline for European Union
Under-resourced
Languages
Autori
Alves , Diego ; Thakkar, Gaurish ; Tadić, Marko
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Proceedings of the LREC 2020 1st Joint SLTU and CCURL Workshop (SLTU-CCURL 2020)
/ Beermann, Dorothee ; Besacier, Laurent ; Sakti, Sakriani ; Soria, Claudia - Marseille : European Language Resources Association (ELRA), 2020, 153-158
ISBN
979-10-95546-35-1
Skup
The 12th Language Resources and Evaluation Conference (LREC2020)
Mjesto i datum
Marseille, Francuska, 11.05.2020. - 16.05.2020
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
language processing chains ; under-resourced language ; European languages resources
Sažetak
This article presents the strategy for developing a platform containing Language Processing Chains for European Union languages, consisting of Tokenization to Parsing, also including Named Entity recognition and with addition of Sentiment Analysis. These chains are part of the first step of an event-centric knowledge processing pipeline whose aim is to process multilingual media information about major events that can cause an impact in Europe and the rest of the world. Due to the differences in terms of availability of language resources for each language, we have built this strategy in three steps, starting with processing chains for the well-resourced languages and finishing with the development of new modules for the under-resourced ones. In order to classify all European Union official languages in terms of resources, we have analysed the size of annotated corpora as well as the existence of pre-trained models in mainstream Language Processing tools, and we have combined this information with the proposed classification published at META- NET whitepaper series.
Izvorni jezik
Engleski
Znanstvena područja
Informacijske i komunikacijske znanosti, Filologija
Napomena
Zbog pandemije krunastoga virusa, kongres nije
održan, ali je zbornik radova objavljen 2020-05-15.
POVEZANOST RADA
Projekti:
EK-H2020-812997 - Cross-lingual Event-centric Open Analytics Research Academy (Cleopatra) (Tadić, Marko, EK - H2020-MSCA-ITN-2018) ( CroRIS)
Ustanove:
Filozofski fakultet, Zagreb
Profili:
Gaurish Pandurang Thakkar
(autor)
Diego Fernando Valio Antunes Alves
(autor)
Marko Tadić
(autor)