Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 1242331

Sharing high-quality language resources in the legal domain to develop neural machine translation for under-resourced European languages


Bago, Petra; Castilho, Sheila; Celeste, Edoardo; Dunne, Jane; Gaspari, Federico; Gíslason, Níels Rúnar; Kåsen, Andre; Klubička, Filip; Kristmannsson, Gauti; McHugh, Helen et al.
Sharing high-quality language resources in the legal domain to develop neural machine translation for under-resourced European languages // Revista de Llengua i Dret, Journal of Language and Law, 78 (2022), 9-34 doi:10.2436/rld.i78.2022.3741 (međunarodna recenzija, članak, znanstveni)


CROSBI ID: 1242331 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Sharing high-quality language resources in the legal domain to develop neural machine translation for under-resourced European languages

Autori
Bago, Petra ; Castilho, Sheila ; Celeste, Edoardo ; Dunne, Jane ; Gaspari, Federico ; Gíslason, Níels Rúnar ; Kåsen, Andre ; Klubička, Filip ; Kristmannsson, Gauti ; McHugh, Helen ; Moran, Róisín ; Ní Loinsigh, Órla ; Olsen, Jon Arild ; Parra Escartín, Carla ; Ramesh, Akshai ; Resende, Natalia ; Sheridan, Páraic ; Way, Andy

Izvornik
Revista de Llengua i Dret, Journal of Language and Law (0212-5056) 78 (2022); 9-34

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
Language resources ; low-resource languages ; legal translation ; neural machine translation (MT) ; evaluation

Sažetak
This article reports some of the main achievements of the EU-funded PRINCIPLE project in collecting high-quality language resources (LRs) in the legal domain for four under-resourced European languages, namely Croatian, Irish, Norwegian and Icelandic. After illustrating the significance of this work for developing translation technologies in the context of the European Union and the European Economic Area, the paper outlines the main steps of data collection, curation and sharing of the LRs gathered with the support of public and private data contributors. This is followed by the description of the development pipeline and key features of the state-of-the-art bespoke neural machine translation (MT) engines for the legal domain that were built using this data. The MT systems were evaluated with a combination of automatic and human methods to validate the quality of the LRs collected in the project ; the high-quality LRs were subsequently shared with the wider community via the ELRC-SHARE repository. The main challenges that were encountered in this work are discussed, emphasising the importance and the key benefits of sharing high- quality digital LRs.

Izvorni jezik
Engleski

Znanstvena područja
Informacijske i komunikacijske znanosti, Interdisciplinarne društvene znanosti, Interdisciplinarne humanističke znanosti



POVEZANOST RADA


Projekti:
EK-CEF Telecom-INEA/CEF/ICT/A2018/1761837 - Providing Resources in Irish, Norwegian, Croatian and Icelandic for Purposes of Language Engineering (PRINCIPLE) (Bago, Petra, EK - 2018-EU-IA-0050) ( CroRIS)

Ustanove:
Filozofski fakultet, Zagreb

Profili:

Avatar Url Petra Bago (autor)

Poveznice na cjeloviti tekst rada:

doi dx.doi.org

Citiraj ovu publikaciju:

Bago, Petra; Castilho, Sheila; Celeste, Edoardo; Dunne, Jane; Gaspari, Federico; Gíslason, Níels Rúnar; Kåsen, Andre; Klubička, Filip; Kristmannsson, Gauti; McHugh, Helen et al.
Sharing high-quality language resources in the legal domain to develop neural machine translation for under-resourced European languages // Revista de Llengua i Dret, Journal of Language and Law, 78 (2022), 9-34 doi:10.2436/rld.i78.2022.3741 (međunarodna recenzija, članak, znanstveni)
Bago, P., Castilho, S., Celeste, E., Dunne, J., Gaspari, F., Gíslason, N., Kåsen, A., Klubička, F., Kristmannsson, G. & McHugh, H. (2022) Sharing high-quality language resources in the legal domain to develop neural machine translation for under-resourced European languages. Revista de Llengua i Dret, Journal of Language and Law, 78, 9-34 doi:10.2436/rld.i78.2022.3741.
@article{article, author = {Bago, Petra and Castilho, Sheila and Celeste, Edoardo and Dunne, Jane and Gaspari, Federico and G\'{\i}slason, N\'{\i}els R\'{u}nar and K\aasen, Andre and Klubi\v{c}ka, Filip and Kristmannsson, Gauti and McHugh, Helen and Moran, R\'{o}is\'{\i}n and N\'{\i} Loinsigh, \'{O}rla and Olsen, Jon Arild and Parra Escart\'{\i}n, Carla and Ramesh, Akshai and Resende, Natalia and Sheridan, P\'{a}raic and Way, Andy}, year = {2022}, pages = {9-34}, DOI = {10.2436/rld.i78.2022.3741}, keywords = {Language resources, low-resource languages, legal translation, neural machine translation (MT), evaluation}, journal = {Revista de Llengua i Dret, Journal of Language and Law}, doi = {10.2436/rld.i78.2022.3741}, volume = {78}, issn = {0212-5056}, title = {Sharing high-quality language resources in the legal domain to develop neural machine translation for under-resourced European languages}, keyword = {Language resources, low-resource languages, legal translation, neural machine translation (MT), evaluation} }
@article{article, author = {Bago, Petra and Castilho, Sheila and Celeste, Edoardo and Dunne, Jane and Gaspari, Federico and G\'{\i}slason, N\'{\i}els R\'{u}nar and K\aasen, Andre and Klubi\v{c}ka, Filip and Kristmannsson, Gauti and McHugh, Helen and Moran, R\'{o}is\'{\i}n and N\'{\i} Loinsigh, \'{O}rla and Olsen, Jon Arild and Parra Escart\'{\i}n, Carla and Ramesh, Akshai and Resende, Natalia and Sheridan, P\'{a}raic and Way, Andy}, year = {2022}, pages = {9-34}, DOI = {10.2436/rld.i78.2022.3741}, keywords = {Language resources, low-resource languages, legal translation, neural machine translation (MT), evaluation}, journal = {Revista de Llengua i Dret, Journal of Language and Law}, doi = {10.2436/rld.i78.2022.3741}, volume = {78}, issn = {0212-5056}, title = {Sharing high-quality language resources in the legal domain to develop neural machine translation for under-resourced European languages}, keyword = {Language resources, low-resource languages, legal translation, neural machine translation (MT), evaluation} }

Časopis indeksira:


  • Web of Science Core Collection (WoSCC)
    • Emerging Sources Citation Index (ESCI)
  • Scopus


Citati:





    Contrast
    Increase Font
    Decrease Font
    Dyslexic Font