Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 217175

Inductive Morphosyntactic Tagsets


Stojanov, Tomislav; Vučković, Kristina; Dovedan, Zdravko
Inductive Morphosyntactic Tagsets // Computational Modeling of Lexical Acquisition
Split, Hrvatska, 2005. (predavanje, nije recenziran, neobjavljeni rad, znanstveni)


CROSBI ID: 217175 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Inductive Morphosyntactic Tagsets

Autori
Stojanov, Tomislav ; Vučković, Kristina ; Dovedan, Zdravko

Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, neobjavljeni rad, znanstveni

Skup
Computational Modeling of Lexical Acquisition

Mjesto i datum
Split, Hrvatska, 25.-28.08.2005

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Nije recenziran

Ključne riječi
tagging; tagset; Croatian language; morphological generator; MULTEXT-East; morphosyntax; morphosyntactic category; morphosyntactic feature; adjective aspect; adjective indefiniteness; deductive method; inductive method; corpus linguistics; machine translat

Sažetak
There is a number of morphological generators for Croatian language such as Kržak (1988), Silić (1996), Tadić (1994, 2003), and others that are parts of Korektor© , Hrvatska Riječ© , Lapis© and other applications, all of which, except Tadić’ s, have application as spelling checkers. Tadić's GenOblik is developed for the need of the corpus linguistics project and annotated according to the Multext-East specification (Erjavec 2001) that Przepiórkowski & Woliński (2003a, b) have critically evaluated having adopted their own tagset closer to grammatical system of Polish language. This paper also approaches from the criticism of the stated specification, but based on a different ground. The following is emphasized: (i) insufficient differentiation of inherent and relational motivated morphosyntactic features – verb relational categories such as modality, conditionality and compound tense cannot be annotated by tag that is added to an individual lexical unit – the stated features (in Croatian as well as in other languages) do not derive from form as such but are relationally conditioned. (ii) lack of adherence from morphosyntactic criteria in establishing formal criteria – semantic features, like the category of common and proper noun, are introduced, whereas other semantic categories, like countability, collectiveness, transitivity, and optativity are not included. Most of the critique towards the Multext-East specification reflects the so-called deductive approach to the tagset design. The tagging system that relies on the more emphasized qualitative approach is discussed in the second part of this paper. Explained is the so called inductive approach to creating a system of tags where the tags are derived from the morphological generator itself which avoids the disadvantages of the deductive system of tags and gains greater grammatical reliability. This could contribute to the greater accuracy in solving homographic forms in the parser’ s algorithm. Six arguments are made in favor of the inductive approach.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo, Informacijske i komunikacijske znanosti, Filologija

Napomena
Druga recenzija za objavu u tiskanu izdanju još u postupku.



POVEZANOST RADA


Projekti:
0130440
0212010

Ustanove:
Filozofski fakultet, Zagreb,
Institut za hrvatski jezik i jezikoslovlje, Zagreb

Citiraj ovu publikaciju

Stojanov, Tomislav; Vučković, Kristina; Dovedan, Zdravko
Inductive Morphosyntactic Tagsets // Computational Modeling of Lexical Acquisition
Split, Hrvatska, 2005. (predavanje, nije recenziran, neobjavljeni rad, znanstveni)
Stojanov, T., Vučković, K. & Dovedan, Z. (2005) Inductive Morphosyntactic Tagsets. U: Computational Modeling of Lexical Acquisition.
@article{article, year = {2005}, keywords = {tagging, tagset, Croatian language, morphological generator, MULTEXT-East, morphosyntax, morphosyntactic category, morphosyntactic feature, adjective aspect, adjective indefiniteness, deductive method, inductive method, corpus linguistics, machine translat}, title = {Inductive Morphosyntactic Tagsets}, keyword = {tagging, tagset, Croatian language, morphological generator, MULTEXT-East, morphosyntax, morphosyntactic category, morphosyntactic feature, adjective aspect, adjective indefiniteness, deductive method, inductive method, corpus linguistics, machine translat}, publisherplace = {Split, Hrvatska} }




Contrast
Increase Font
Decrease Font
Dyslexic Font