Corpus Analysis of Complex Names with Common Nounsin Croatian (CROSBI ID 70141)
Prilog u knjizi | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Matas Ivanković, Ivana ; Blagus Bartolec, Goranka
engleski
Corpus Analysis of Complex Names with Common Nounsin Croatian
The goal of this corpus-based researchis to see can the complex names with common nouns in their composition be extracted from Croatian hrWaC v2.2 corpus by using regular expressions, i.e. to what extent the capital letter (not the one after the full stop, the exclamation mark or the question mark)can be taken as an indication of a name. Common noun can be used as a regular noun or as a constituent of a complex name, which, on one hand, makes it difficult to tag them automatically, and on the other hand, affects the lexicographic description. With the help of regular expressions, we searched for capitalized common nouns and for sequences in which a capitalized attribute is on the first place and the common noun follows it. After analyzing 1000 examples in each search, we divided results into two groups: names and sequences with an uppercase letter that are not names. Some of the causes of extracting “false” names are technical (e.g. interpunction:separating sentences with paragraph mark(¶), lack of interpunction at the end of sentence ; whole parts of textwrit-ten in upper case...), and some of them lie in the texts crawled for hrWaC, which are not written in accordance with Croatian orthography.
Complex Names, Croatian Orthography, Corpus Search
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
106-113.
objavljeno
10.26615/978-2-9701095-6-3_014
Podaci o knjizi
Computational and Corpus-based Phraseology: Proceedings of the Third International Conference EUROPHRAS 2019
Corpas Pastor, Gloria ; Mitkov, Ruslan ; Kunilovskaya, Maria ; Losey León, María Araceli
Ženeva: Editions Tradulex
2019.
978-2-9701095-6-3