Napredna pretraga

Pregled bibliografske jedinice broj: 701463

Multi-label Classification of Croatian Legal Documents Using EuroVoc Thesaurus


Šarić, Frane; Dalbelo Bašić, Bojana; Moens, Marie-Francine; Šnajder, Jan
Multi-label Classification of Croatian Legal Documents Using EuroVoc Thesaurus // Proceedings of Workshop on Semantic Processing of Legal Texts (SPLeT2014)
Reykjavik: European Language Resources Association, 2014. str. 7-12 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)


Naslov
Multi-label Classification of Croatian Legal Documents Using EuroVoc Thesaurus

Autori
Šarić, Frane ; Dalbelo Bašić, Bojana ; Moens, Marie-Francine ; Šnajder, Jan

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of Workshop on Semantic Processing of Legal Texts (SPLeT2014) / - Reykjavik : European Language Resources Association, 2014, 7-12

Skup
Workshop on Semantic Processing of Legal Texts

Mjesto i datum
Reykjavik, Island, 26.-31.05.2014

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
multi-label classification; automatic indexing; class sparsity; EuroVoc thesaurus; legal documents

Sažetak
The automatic indexing of legal documents can improve access to legislation. EuroVoc thesaurus has been used to index documents of the European Parliament as well as national legislative. A number of studies exists that address the task of automatic EuroVoc indexing. In this paper we describe the work on EuroVoc indexing of Croatian legislative documents. We focus on the machine learning aspect of the problem. First, we describe the manually indexed Croatian legislative documents collection, which we make freely available. Secondly, we describe the multi-label classification experiments on this collection. A challenge of EuroVoc indexing is class sparsity, and we discuss some strategies to address it. Our best model achieves a precision of 79.7%, recall of 60.2%, and F1 score of 68.6%.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo



POVEZANOST RADA


Projekt / tema
036-1300646-1986 - Otkrivanje znanja u tekstnim podacima (Bojana Dalbelo-Bašić, )

Ustanove
Fakultet elektrotehnike i računarstva, Zagreb

Profili:

Avatar Url Frane Šarić (autor)

Avatar Url Bojana Dalbelo-Bašić (autor)

Avatar Url Jan Šnajder (autor)

Citiraj ovu publikaciju

Šarić, Frane; Dalbelo Bašić, Bojana; Moens, Marie-Francine; Šnajder, Jan
Multi-label Classification of Croatian Legal Documents Using EuroVoc Thesaurus // Proceedings of Workshop on Semantic Processing of Legal Texts (SPLeT2014)
Reykjavik: European Language Resources Association, 2014. str. 7-12 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
Šarić, F., Dalbelo Bašić, B., Moens, M. & Šnajder, J. (2014) Multi-label Classification of Croatian Legal Documents Using EuroVoc Thesaurus. U: Proceedings of Workshop on Semantic Processing of Legal Texts (SPLeT2014).
@article{article, year = {2014}, pages = {7-12}, keywords = {multi-label classification, automatic indexing, class sparsity, EuroVoc thesaurus, legal documents}, title = {Multi-label Classification of Croatian Legal Documents Using EuroVoc Thesaurus}, keyword = {multi-label classification, automatic indexing, class sparsity, EuroVoc thesaurus, legal documents}, publisher = {European Language Resources Association}, publisherplace = {Reykjavik, Island} }