Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 1131596

Corpus Analysis of Complex Names with Common Nounsin Croatian


Matas Ivanković, Ivana; Blagus Bartolec, Goranka
Corpus Analysis of Complex Names with Common Nounsin Croatian // Computational and Corpus-based Phraseology: Proceedings of the Third International Conference EUROPHRAS 2019 / Corpas Pastor, Gloria ; Mitkov, Ruslan ; Kunilovskaya, Maria ; Losey León, María Araceli (ur.).
Ženeva: Editions Tradulex, 2019. str. 106-113 doi:10.26615/978-2-9701095-6-3_014


CROSBI ID: 1131596 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Corpus Analysis of Complex Names with Common Nounsin Croatian

Autori
Matas Ivanković, Ivana ; Blagus Bartolec, Goranka

Vrsta, podvrsta i kategorija rada
Poglavlja u knjigama, znanstveni

Knjiga
Computational and Corpus-based Phraseology: Proceedings of the Third International Conference EUROPHRAS 2019

Urednik/ci
Corpas Pastor, Gloria ; Mitkov, Ruslan ; Kunilovskaya, Maria ; Losey León, María Araceli

Izdavač
Editions Tradulex

Grad
Ženeva

Godina
2019

Raspon stranica
106-113

ISBN
978-2-9701095-6-3

Ključne riječi
Complex Names, Croatian Orthography, Corpus Search

Sažetak
The goal of this corpus-based researchis to see can the complex names with common nouns in their composition be extracted from Croatian hrWaC v2.2 corpus by using regular expressions, i.e. to what extent the capital letter (not the one after the full stop, the exclamation mark or the question mark)can be taken as an indication of a name. Common noun can be used as a regular noun or as a constituent of a complex name, which, on one hand, makes it difficult to tag them automatically, and on the other hand, affects the lexicographic description. With the help of regular expressions, we searched for capitalized common nouns and for sequences in which a capitalized attribute is on the first place and the common noun follows it. After analyzing 1000 examples in each search, we divided results into two groups: names and sequences with an uppercase letter that are not names. Some of the causes of extracting “false” names are technical (e.g. interpunction:separating sentences with paragraph mark(¶), lack of interpunction at the end of sentence ; whole parts of textwrit-ten in upper case...), and some of them lie in the texts crawled for hrWaC, which are not written in accordance with Croatian orthography.

Izvorni jezik
Engleski

Znanstvena područja
Filologija



POVEZANOST RADA


Ustanove:
Institut za hrvatski jezik i jezikoslovlje, Zagreb

Poveznice na cjeloviti tekst rada:

doi www.tradulex.com

Citiraj ovu publikaciju:

Matas Ivanković, Ivana; Blagus Bartolec, Goranka
Corpus Analysis of Complex Names with Common Nounsin Croatian // Computational and Corpus-based Phraseology: Proceedings of the Third International Conference EUROPHRAS 2019 / Corpas Pastor, Gloria ; Mitkov, Ruslan ; Kunilovskaya, Maria ; Losey León, María Araceli (ur.).
Ženeva: Editions Tradulex, 2019. str. 106-113 doi:10.26615/978-2-9701095-6-3_014
Matas Ivanković, I. & Blagus Bartolec, G. (2019) Corpus Analysis of Complex Names with Common Nounsin Croatian. U: Corpas Pastor, G., Mitkov, R., Kunilovskaya, M. & Losey León, M. (ur.) Computational and Corpus-based Phraseology: Proceedings of the Third International Conference EUROPHRAS 2019. Ženeva, Editions Tradulex, str. 106-113 doi:10.26615/978-2-9701095-6-3_014.
@inbook{inbook, author = {Matas Ivankovi\'{c}, Ivana and Blagus Bartolec, Goranka}, year = {2019}, pages = {106-113}, DOI = {10.26615/978-2-9701095-6-3\_014}, keywords = {Complex Names, Croatian Orthography, Corpus Search}, doi = {10.26615/978-2-9701095-6-3\_014}, isbn = {978-2-9701095-6-3}, title = {Corpus Analysis of Complex Names with Common Nounsin Croatian}, keyword = {Complex Names, Croatian Orthography, Corpus Search}, publisher = {Editions Tradulex}, publisherplace = {\v{Z}eneva} }
@inbook{inbook, author = {Matas Ivankovi\'{c}, Ivana and Blagus Bartolec, Goranka}, year = {2019}, pages = {106-113}, DOI = {10.26615/978-2-9701095-6-3\_014}, keywords = {Complex Names, Croatian Orthography, Corpus Search}, doi = {10.26615/978-2-9701095-6-3\_014}, isbn = {978-2-9701095-6-3}, title = {Corpus Analysis of Complex Names with Common Nounsin Croatian}, keyword = {Complex Names, Croatian Orthography, Corpus Search}, publisher = {Editions Tradulex}, publisherplace = {\v{Z}eneva} }

Citati:





    Contrast
    Increase Font
    Decrease Font
    Dyslexic Font