Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 1131836

Information Extraction from Free-Form CV Documents in Multiple Languages


Vukadin, Davor; Kurdija, Adrian Satja; Delač, Goran; Šilić, Marin
Information Extraction from Free-Form CV Documents in Multiple Languages // IEEE access, 9 (2021), 84559-84575 doi:10.1109/access.2021.3087913 (međunarodna recenzija, članak, znanstveni)


CROSBI ID: 1131836 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Information Extraction from Free-Form CV Documents in Multiple Languages

Autori
Vukadin, Davor ; Kurdija, Adrian Satja ; Delač, Goran ; Šilić, Marin

Izvornik
IEEE access (2169-3536) 9 (2021); 84559-84575

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
Information retrieval ; Natural language processing ; Text analysis ; Recurrent neural networks ; CV parsing

Sažetak
This paper proposes two natural language processing models for extracting useful information from multilingual, unstructured (free form) CV documents. The model identifies the relevant document sections (personal information, education, employment, etc.) and the corresponding specific information at the lower hierarchy level (names, addresses, roles, skill competences, etc.). Our approach employs the transformer architecture and its multilingual implementation of the encoder part in the form of the BERT language model. The models are trained and tested on a large, manually annotated CV dataset, achieving high scores on standard accuracy measures. The proposed models exhibit important properties of end-to-end training and interpretability, which was investigated by visualizing the model attention and its vector representations.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo



POVEZANOST RADA


Projekti:
IP-2018-01-6423 - Pouzdani kompozitni primjenski sustavi zasnovani na web uslugama (RELS) (Srbljić, Siniša, HRZZ - 2018-01) ( POIROT)

Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb

Profili:

Avatar Url Marin Šilić (autor)

Avatar Url Goran Delač (autor)

Avatar Url Adrian Satja Kurdija (autor)

Poveznice na cjeloviti tekst rada:

doi ieeexplore.ieee.org

Citiraj ovu publikaciju:

Vukadin, Davor; Kurdija, Adrian Satja; Delač, Goran; Šilić, Marin
Information Extraction from Free-Form CV Documents in Multiple Languages // IEEE access, 9 (2021), 84559-84575 doi:10.1109/access.2021.3087913 (međunarodna recenzija, članak, znanstveni)
Vukadin, D., Kurdija, A., Delač, G. & Šilić, M. (2021) Information Extraction from Free-Form CV Documents in Multiple Languages. IEEE access, 9, 84559-84575 doi:10.1109/access.2021.3087913.
@article{article, year = {2021}, pages = {84559-84575}, DOI = {10.1109/access.2021.3087913}, keywords = {Information retrieval, Natural language processing, Text analysis, Recurrent neural networks, CV parsing}, journal = {IEEE access}, doi = {10.1109/access.2021.3087913}, volume = {9}, issn = {2169-3536}, title = {Information Extraction from Free-Form CV Documents in Multiple Languages}, keyword = {Information retrieval, Natural language processing, Text analysis, Recurrent neural networks, CV parsing} }
@article{article, year = {2021}, pages = {84559-84575}, DOI = {10.1109/access.2021.3087913}, keywords = {Information retrieval, Natural language processing, Text analysis, Recurrent neural networks, CV parsing}, journal = {IEEE access}, doi = {10.1109/access.2021.3087913}, volume = {9}, issn = {2169-3536}, title = {Information Extraction from Free-Form CV Documents in Multiple Languages}, keyword = {Information retrieval, Natural language processing, Text analysis, Recurrent neural networks, CV parsing} }

Časopis indeksira:


  • Current Contents Connect (CCC)
  • Web of Science Core Collection (WoSCC)
    • Science Citation Index Expanded (SCI-EXP)
    • SCI-EXP, SSCI i/ili A&HCI
  • Scopus


Citati:





    Contrast
    Increase Font
    Decrease Font
    Dyslexic Font