Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi

Building Multilingual Corpora for a Complex Named Entity Recognition and Classification Hierarchy using Wikipedia and DBpedia (CROSBI ID 701519)

Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija

Alves, Diego ; Thakkar, Gaurish ; Amaral, Gabriel ; Kuculo, Tin ; Tadić, Marko Building Multilingual Corpora for a Complex Named Entity Recognition and Classification Hierarchy using Wikipedia and DBpedia // Proceedings of the Conference on Digital Curation Technologies (Qurator 2021) Berlin, Germany, February 8th to 12th, 2021 / Paschke, Adrian et al. (ur.). Berlin, 2021

Podaci o odgovornosti

Alves, Diego ; Thakkar, Gaurish ; Amaral, Gabriel ; Kuculo, Tin ; Tadić, Marko

engleski

Building Multilingual Corpora for a Complex Named Entity Recognition and Classification Hierarchy using Wikipedia and DBpedia

With the ever-growing popularity of the field of NLP, the demand for datasets in low resourced- languages follows suit. Following a previously established framework, in this paper1 , we present the UNER dataset, a multilingual and hierarchical parallel corpus annotated for named-entities. We describe in detail the developed procedure necessary to create this type of dataset in any language available on Wikipedia with DBpedia information. The three- step procedure extracts entities from Wikipedia articles, links them to DBpedia, and maps the DBpedia sets of classes to the UNER labels. This is followed by a post-processing procedure that significantly increases the number of identified entities in the final results. The paper concludes with a statistical and qualitative analysis of the resulting dataset.

named-entity ; multilingualism ; data extraction

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o prilogu

17

2021.

objavljeno

Podaci o matičnoj publikaciji

Proceedings of the Conference on Digital Curation Technologies (Qurator 2021) Berlin, Germany, February 8th to 12th, 2021

Paschke, Adrian et al.

Berlin:

1613-0073

Podaci o skupu

QURATOR 2021: Conference on Digital Curation Technologies

predavanje

08.02.2021-12.02.2021

Berlin, Njemačka

Povezanost rada

Filologija, Informacijske i komunikacijske znanosti

Poveznice