Pregled bibliografske jedinice broj: 1131836
Information Extraction from Free-Form CV Documents in Multiple Languages
Information Extraction from Free-Form CV Documents in Multiple Languages // IEEE access, 9 (2021), 84559-84575 doi:10.1109/access.2021.3087913 (međunarodna recenzija, članak, znanstveni)
CROSBI ID: 1131836 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Information Extraction from Free-Form CV Documents
in Multiple Languages
Autori
Vukadin, Davor ; Kurdija, Adrian Satja ; Delač, Goran ; Šilić, Marin
Izvornik
IEEE access (2169-3536) 9
(2021);
84559-84575
Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni
Ključne riječi
Information retrieval ; Natural language processing ; Text analysis ; Recurrent neural networks ; CV parsing
Sažetak
This paper proposes two natural language processing models for extracting useful information from multilingual, unstructured (free form) CV documents. The model identifies the relevant document sections (personal information, education, employment, etc.) and the corresponding specific information at the lower hierarchy level (names, addresses, roles, skill competences, etc.). Our approach employs the transformer architecture and its multilingual implementation of the encoder part in the form of the BERT language model. The models are trained and tested on a large, manually annotated CV dataset, achieving high scores on standard accuracy measures. The proposed models exhibit important properties of end-to-end training and interpretability, which was investigated by visualizing the model attention and its vector representations.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo
POVEZANOST RADA
Projekti:
HRZZ-IP-2018-01-6423 - Pouzdani kompozitni primjenski sustavi zasnovani na web uslugama (RELS) (Srbljić, Siniša, HRZZ ) ( CroRIS)
Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb
Citiraj ovu publikaciju:
Časopis indeksira:
- Current Contents Connect (CCC)
- Web of Science Core Collection (WoSCC)
- Science Citation Index Expanded (SCI-EXP)
- SCI-EXP, SSCI i/ili A&HCI
- Scopus