Information Extraction from Free-Form CV Documents in Multiple Languages (CROSBI ID 295560)
Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Vukadin, Davor ; Kurdija, Adrian Satja ; Delač, Goran ; Šilić, Marin
engleski
Information Extraction from Free-Form CV Documents in Multiple Languages
This paper proposes two natural language processing models for extracting useful information from multilingual, unstructured (free form) CV documents. The model identifies the relevant document sections (personal information, education, employment, etc.) and the corresponding specific information at the lower hierarchy level (names, addresses, roles, skill competences, etc.). Our approach employs the transformer architecture and its multilingual implementation of the encoder part in the form of the BERT language model. The models are trained and tested on a large, manually annotated CV dataset, achieving high scores on standard accuracy measures. The proposed models exhibit important properties of end-to-end training and interpretability, which was investigated by visualizing the model attention and its vector representations.
Information retrieval ; Natural language processing ; Text analysis ; Recurrent neural networks ; CV parsing
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano