Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi

ReSiPC: a Tool for Complex Searches in Parallel Corpora (CROSBI ID 698625)

Prilog sa skupa u zborniku | ostalo | međunarodna recenzija

Oliver, Antoni ; Mikelenić, Bojana ReSiPC: a Tool for Complex Searches in Parallel Corpora // Proceedings of The 12th Language Resources and Evaluation Conference / Calzolari, Nicoletta ; Béchet, Frédéric ; Blache, Philippe et al. (ur.). Marseille: European Language Resources Association (ELRA), 2020. str. 7033-7037

Podaci o odgovornosti

Oliver, Antoni ; Mikelenić, Bojana

engleski

ReSiPC: a Tool for Complex Searches in Parallel Corpora

In this paper, a tool specifically designed to allow for complex searches in large parallel corpora is presented. The formalism for the queries is very powerful as it uses standard regular expressions that allow for complex queries combining word forms, lemmata and POS- tags. As queries are performed over POS-tags, at least one of the languages in the parallel corpus should be POS-tagged. Searches can be performed in one of the languages or in both languages at the same time. The program is able to POS-tag the corpora using the Freeling analyzer through its Python API. ReSiPC is developed in Python version 3 and it is distributed under a free license (GNU GPL). The tool can be used to provide data for contrastive linguistics research and an example of use in a Spanish-Croatian parallel corpus is presented. ReSiPC is designed for queries in POS-tagged corpora, but it can be easily adapted for querying corpora containing other kinds of information.

parallel corpora ; regular expressions ; contrastive linguistics

Zbog pandemije krunastoga virusa, kongres nije održan, ali je zbornik radova objavljen 2020-05- 15.

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o prilogu

7033-7037.

2020.

objavljeno

Podaci o matičnoj publikaciji

Proceedings of The 12th Language Resources and Evaluation Conference

Calzolari, Nicoletta ; Béchet, Frédéric ; Blache, Philippe ; Choukri, Khalid ; Cieri, Christopher ; Declerck, Thierry ; Goggi, Sara ; Isahara, Hitoshi ; Maegaard, Bente ; Mariani, Joseph ; Mazo, Hélène ; Moreno, Asuncion ; Odijk, Jan ; Piperidis, Stelios

Marseille: European Language Resources Association (ELRA)

979-10-95546-34-4

Podaci o skupu

The 12th Language Resources and Evaluation Conference (LREC2020)

poster

11.05.2020-16.05.2020

Marseille, Francuska

Povezanost rada

Filologija, Informacijske i komunikacijske znanosti

Poveznice