Information Extraction from Free-Form CV Documents in Multiple Languages

Vukadin, Davor; Kurdija, Adrian Satja; Delač, Goran; Šilić, Marin

izvor podataka: crosbi !

Information Extraction from Free-Form CV Documents in Multiple Languages (CROSBI ID 295560)

Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija

Vukadin, Davor ; Kurdija, Adrian Satja ; Delač, Goran ; Šilić, Marin Information Extraction from Free-Form CV Documents in Multiple Languages // IEEE access, 9 (2021), 84559-84575. doi: 10.1109/access.2021.3087913

Podaci o odgovornosti

Autori

Vukadin, Davor ; Kurdija, Adrian Satja ; Delač, Goran ; Šilić, Marin

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

Information Extraction from Free-Form CV Documents in Multiple Languages

Sažetak

This paper proposes two natural language processing models for extracting useful information from multilingual, unstructured (free form) CV documents. The model identifies the relevant document sections (personal information, education, employment, etc.) and the corresponding specific information at the lower hierarchy level (names, addresses, roles, skill competences, etc.). Our approach employs the transformer architecture and its multilingual implementation of the encoder part in the form of the BERT language model. The models are trained and tested on a large, manually annotated CV dataset, achieving high scores on standard accuracy measures. The proposed models exhibit important properties of end-to-end training and interpretability, which was investigated by visualizing the model attention and its vector representations.

Ključne riječi

Information retrieval ; Natural language processing ; Text analysis ; Recurrent neural networks ; CV parsing

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o izdanju

Časopis

IEEE access

Volumen (broj)

Godina

2021.

Stranice rada

84559-84575

Status objave rada

objavljeno

e-ISSN

2169-3536

DOI

10.1109/access.2021.3087913

Trošak objave rada u otvorenom pristupu

APC

1,00 USD

Povezanost rada

Povezane osobe

Davor Vukadin (autor/i)

Adrian Satja Kurdija (autor/i)

Goran Delač (autor/i)

Marin Šilić (autor/i)

Povezane ustanove

Fakultet elektrotehnike i računarstva (036) (autorova ustanova)

Povezani projekti

Pouzdani kompozitni primjenski sustavi zasnovani na web uslugama (rezultat rada na projektu)

Područje

Računarstvo

Poveznice

doi.org

ieeexplore.ieee.org

Indeksiranost

Scopus

Current Contents Connect (CCC)

Web of Science Core Collection, Science Citation Index Expanded (WoSCC-SCI-Exp)

Web of Science Core Collection, SCI-Exp, SSCI & A&HCI (WoSCC-SCI-Exp, SSCI, A&HCI)