Pregled bibliografske jedinice broj: 701463
Multi-label Classification of Croatian Legal Documents Using EuroVoc Thesaurus
Multi-label Classification of Croatian Legal Documents Using EuroVoc Thesaurus // Proceedings of Workshop on Semantic Processing of Legal Texts (SPLeT2014)
Reykjavík: European Language Resources Association (ELRA), 2014. str. 7-12 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 701463 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Multi-label Classification of Croatian Legal Documents Using EuroVoc Thesaurus
Autori
Šarić, Frane ; Dalbelo Bašić, Bojana ; Moens, Marie-Francine ; Šnajder, Jan
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Proceedings of Workshop on Semantic Processing of Legal Texts (SPLeT2014)
/ - Reykjavík : European Language Resources Association (ELRA), 2014, 7-12
Skup
Workshop on Semantic Processing of Legal Texts
Mjesto i datum
Reykjavík, Island, 26.05.2014. - 31.05.2014
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
multi-label classification; automatic indexing; class sparsity; EuroVoc thesaurus; legal documents
Sažetak
The automatic indexing of legal documents can improve access to legislation. EuroVoc thesaurus has been used to index documents of the European Parliament as well as national legislative. A number of studies exists that address the task of automatic EuroVoc indexing. In this paper we describe the work on EuroVoc indexing of Croatian legislative documents. We focus on the machine learning aspect of the problem. First, we describe the manually indexed Croatian legislative documents collection, which we make freely available. Secondly, we describe the multi-label classification experiments on this collection. A challenge of EuroVoc indexing is class sparsity, and we discuss some strategies to address it. Our best model achieves a precision of 79.7%, recall of 60.2%, and F1 score of 68.6%.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo
POVEZANOST RADA
Projekti:
036-1300646-1986 - Otkrivanje znanja u tekstnim podacima (Dalbelo-Bašić, Bojana, MZO ) ( CroRIS)
Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb