Multi-label Classification of Croatian Legal Documents Using EuroVoc Thesaurus (CROSBI ID 611937)
Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Šarić, Frane ; Dalbelo Bašić, Bojana ; Moens, Marie-Francine ; Šnajder, Jan
engleski
Multi-label Classification of Croatian Legal Documents Using EuroVoc Thesaurus
The automatic indexing of legal documents can improve access to legislation. EuroVoc thesaurus has been used to index documents of the European Parliament as well as national legislative. A number of studies exists that address the task of automatic EuroVoc indexing. In this paper we describe the work on EuroVoc indexing of Croatian legislative documents. We focus on the machine learning aspect of the problem. First, we describe the manually indexed Croatian legislative documents collection, which we make freely available. Secondly, we describe the multi-label classification experiments on this collection. A challenge of EuroVoc indexing is class sparsity, and we discuss some strategies to address it. Our best model achieves a precision of 79.7%, recall of 60.2%, and F1 score of 68.6%.
multi-label classification; automatic indexing; class sparsity; EuroVoc thesaurus; legal documents
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
7-12.
2014.
objavljeno
Podaci o matičnoj publikaciji
Proceedings of Workshop on Semantic Processing of Legal Texts (SPLeT2014)
Reykjavík: European Language Resources Association (ELRA)
Podaci o skupu
Workshop on Semantic Processing of Legal Texts
predavanje
26.05.2014-31.05.2014
Reykjavík, Island