Pregled bibliografske jedinice broj: 217175
Inductive Morphosyntactic Tagsets
Inductive Morphosyntactic Tagsets // Computational Modeling of Lexical Acquisition
Split, Hrvatska, 2005. (predavanje, nije recenziran, neobjavljeni rad, znanstveni)
CROSBI ID: 217175 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Inductive Morphosyntactic Tagsets
Autori
Stojanov, Tomislav ; Vučković, Kristina ; Dovedan, Zdravko
Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, neobjavljeni rad, znanstveni
Skup
Computational Modeling of Lexical Acquisition
Mjesto i datum
Split, Hrvatska, 25.08.2005. - 28.08.2005
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Nije recenziran
Ključne riječi
tagging; tagset; Croatian language; morphological generator; MULTEXT-East; morphosyntax; morphosyntactic category; morphosyntactic feature; adjective aspect; adjective indefiniteness; deductive method; inductive method; corpus linguistics; machine translat
Sažetak
There is a number of morphological generators for Croatian language such as Kržak (1988), Silić (1996), Tadić (1994, 2003), and others that are parts of Korektor© , Hrvatska Riječ© , Lapis© and other applications, all of which, except Tadić’ s, have application as spelling checkers. Tadić's GenOblik is developed for the need of the corpus linguistics project and annotated according to the Multext-East specification (Erjavec 2001) that Przepiórkowski & Woliński (2003a, b) have critically evaluated having adopted their own tagset closer to grammatical system of Polish language. This paper also approaches from the criticism of the stated specification, but based on a different ground. The following is emphasized: (i) insufficient differentiation of inherent and relational motivated morphosyntactic features – verb relational categories such as modality, conditionality and compound tense cannot be annotated by tag that is added to an individual lexical unit – the stated features (in Croatian as well as in other languages) do not derive from form as such but are relationally conditioned. (ii) lack of adherence from morphosyntactic criteria in establishing formal criteria – semantic features, like the category of common and proper noun, are introduced, whereas other semantic categories, like countability, collectiveness, transitivity, and optativity are not included. Most of the critique towards the Multext-East specification reflects the so-called deductive approach to the tagset design. The tagging system that relies on the more emphasized qualitative approach is discussed in the second part of this paper. Explained is the so called inductive approach to creating a system of tags where the tags are derived from the morphological generator itself which avoids the disadvantages of the deductive system of tags and gains greater grammatical reliability. This could contribute to the greater accuracy in solving homographic forms in the parser’ s algorithm. Six arguments are made in favor of the inductive approach.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo, Informacijske i komunikacijske znanosti, Filologija
Napomena
Druga recenzija za objavu u tiskanu izdanju još u postupku.
POVEZANOST RADA
Ustanove:
Filozofski fakultet, Zagreb,
Institut za hrvatski jezik i jezikoslovlje, Zagreb