Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi !

Corpus Analysis of Complex Names with Common Nounsin Croatian (CROSBI ID 70141)

Prilog u knjizi | izvorni znanstveni rad | međunarodna recenzija

Matas Ivanković, Ivana ; Blagus Bartolec, Goranka Corpus Analysis of Complex Names with Common Nounsin Croatian // Computational and Corpus-based Phraseology: Proceedings of the Third International Conference EUROPHRAS 2019 / Corpas Pastor, Gloria ; Mitkov, Ruslan ; Kunilovskaya, Maria et al. (ur.). Ženeva: Editions Tradulex, 2019. str. 106-113 doi: 10.26615/978-2-9701095-6-3_014

Podaci o odgovornosti

Matas Ivanković, Ivana ; Blagus Bartolec, Goranka

engleski

Corpus Analysis of Complex Names with Common Nounsin Croatian

The goal of this corpus-based researchis to see can the complex names with common nouns in their composition be extracted from Croatian hrWaC v2.2 corpus by using regular expressions, i.e. to what extent the capital letter (not the one after the full stop, the exclamation mark or the question mark)can be taken as an indication of a name. Common noun can be used as a regular noun or as a constituent of a complex name, which, on one hand, makes it difficult to tag them automatically, and on the other hand, affects the lexicographic description. With the help of regular expressions, we searched for capitalized common nouns and for sequences in which a capitalized attribute is on the first place and the common noun follows it. After analyzing 1000 examples in each search, we divided results into two groups: names and sequences with an uppercase letter that are not names. Some of the causes of extracting “false” names are technical (e.g. interpunction:separating sentences with paragraph mark(¶), lack of interpunction at the end of sentence ; whole parts of textwrit-ten in upper case...), and some of them lie in the texts crawled for hrWaC, which are not written in accordance with Croatian orthography.

Complex Names, Croatian Orthography, Corpus Search

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o prilogu

106-113.

objavljeno

10.26615/978-2-9701095-6-3_014

Podaci o knjizi

Computational and Corpus-based Phraseology: Proceedings of the Third International Conference EUROPHRAS 2019

Corpas Pastor, Gloria ; Mitkov, Ruslan ; Kunilovskaya, Maria ; Losey León, María Araceli

Ženeva: Editions Tradulex

2019.

978-2-9701095-6-3

Povezanost rada

Filologija

Poveznice