Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 1119513

Building Multilingual Corpora for a Complex Named Entity Recognition and Classification Hierarchy using Wikipedia and DBpedia


Alves, Diego; Thakkar, Gaurish; Amaral, Gabriel; Kuculo, Tin; Tadić, Marko
Building Multilingual Corpora for a Complex Named Entity Recognition and Classification Hierarchy using Wikipedia and DBpedia // Proceedings of the Conference on Digital Curation Technologies (Qurator 2021) Berlin, Germany, February 8th to 12th, 2021 / Paschke, Adrian et al. (ur.).
Berlin, 2021. 17, 11 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)


CROSBI ID: 1119513 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Building Multilingual Corpora for a Complex Named Entity Recognition and Classification Hierarchy using Wikipedia and DBpedia

Autori
Alves, Diego ; Thakkar, Gaurish ; Amaral, Gabriel ; Kuculo, Tin ; Tadić, Marko

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of the Conference on Digital Curation Technologies (Qurator 2021) Berlin, Germany, February 8th to 12th, 2021 / Paschke, Adrian et al. - Berlin, 2021

Skup
QURATOR 2021: Conference on Digital Curation Technologies

Mjesto i datum
Berlin, Njemačka, 08.02.2021. - 12.02.2021

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
named-entity ; multilingualism ; data extraction

Sažetak
With the ever-growing popularity of the field of NLP, the demand for datasets in low resourced- languages follows suit. Following a previously established framework, in this paper1 , we present the UNER dataset, a multilingual and hierarchical parallel corpus annotated for named-entities. We describe in detail the developed procedure necessary to create this type of dataset in any language available on Wikipedia with DBpedia information. The three- step procedure extracts entities from Wikipedia articles, links them to DBpedia, and maps the DBpedia sets of classes to the UNER labels. This is followed by a post-processing procedure that significantly increases the number of identified entities in the final results. The paper concludes with a statistical and qualitative analysis of the resulting dataset.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti, Filologija



POVEZANOST RADA


Ustanove:
Filozofski fakultet, Zagreb

Poveznice na cjeloviti tekst rada:

ceur-ws.org

Citiraj ovu publikaciju:

Alves, Diego; Thakkar, Gaurish; Amaral, Gabriel; Kuculo, Tin; Tadić, Marko
Building Multilingual Corpora for a Complex Named Entity Recognition and Classification Hierarchy using Wikipedia and DBpedia // Proceedings of the Conference on Digital Curation Technologies (Qurator 2021) Berlin, Germany, February 8th to 12th, 2021 / Paschke, Adrian et al. (ur.).
Berlin, 2021. 17, 11 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
Alves, D., Thakkar, G., Amaral, G., Kuculo, T. & Tadić, M. (2021) Building Multilingual Corpora for a Complex Named Entity Recognition and Classification Hierarchy using Wikipedia and DBpedia. U: Paschke, A. (ur.)Proceedings of the Conference on Digital Curation Technologies (Qurator 2021) Berlin, Germany, February 8th to 12th, 2021.
@article{article, author = {Alves, Diego and Thakkar, Gaurish and Amaral, Gabriel and Kuculo, Tin and Tadi\'{c}, Marko}, editor = {Paschke, A.}, year = {2021}, pages = {11}, chapter = {17}, keywords = {named-entity, multilingualism, data extraction}, title = {Building Multilingual Corpora for a Complex Named Entity Recognition and Classification Hierarchy using Wikipedia and DBpedia}, keyword = {named-entity, multilingualism, data extraction}, publisherplace = {Berlin, Njema\v{c}ka}, chapternumber = {17} }
@article{article, author = {Alves, Diego and Thakkar, Gaurish and Amaral, Gabriel and Kuculo, Tin and Tadi\'{c}, Marko}, editor = {Paschke, A.}, year = {2021}, pages = {11}, chapter = {17}, keywords = {named-entity, multilingualism, data extraction}, title = {Building Multilingual Corpora for a Complex Named Entity Recognition and Classification Hierarchy using Wikipedia and DBpedia}, keyword = {named-entity, multilingualism, data extraction}, publisherplace = {Berlin, Njema\v{c}ka}, chapternumber = {17} }




Contrast
Increase Font
Decrease Font
Dyslexic Font