Inductive Morphosyntactic Tagsets (CROSBI ID 511181)
Neobjavljeno sudjelovanje sa skupa | neobjavljeni prilog sa skupa
Podaci o odgovornosti
Stojanov, Tomislav ; Vučković, Kristina ; Dovedan, Zdravko
engleski
Inductive Morphosyntactic Tagsets
There is a number of morphological generators for Croatian language such as Kržak (1988), Silić (1996), Tadić (1994, 2003), and others that are parts of Korektor© , Hrvatska Riječ© , Lapis© and other applications, all of which, except Tadić’ s, have application as spelling checkers. Tadić's GenOblik is developed for the need of the corpus linguistics project and annotated according to the Multext-East specification (Erjavec 2001) that Przepiórkowski & Woliński (2003a, b) have critically evaluated having adopted their own tagset closer to grammatical system of Polish language. This paper also approaches from the criticism of the stated specification, but based on a different ground. The following is emphasized: (i) insufficient differentiation of inherent and relational motivated morphosyntactic features – verb relational categories such as modality, conditionality and compound tense cannot be annotated by tag that is added to an individual lexical unit – the stated features (in Croatian as well as in other languages) do not derive from form as such but are relationally conditioned. (ii) lack of adherence from morphosyntactic criteria in establishing formal criteria – semantic features, like the category of common and proper noun, are introduced, whereas other semantic categories, like countability, collectiveness, transitivity, and optativity are not included. Most of the critique towards the Multext-East specification reflects the so-called deductive approach to the tagset design. The tagging system that relies on the more emphasized qualitative approach is discussed in the second part of this paper. Explained is the so called inductive approach to creating a system of tags where the tags are derived from the morphological generator itself which avoids the disadvantages of the deductive system of tags and gains greater grammatical reliability. This could contribute to the greater accuracy in solving homographic forms in the parser’ s algorithm. Six arguments are made in favor of the inductive approach.
tagging; tagset; Croatian language; morphological generator; MULTEXT-East; morphosyntax; morphosyntactic category; morphosyntactic feature; adjective aspect; adjective indefiniteness; deductive method; inductive method; corpus linguistics; machine translat
Druga recenzija za objavu u tiskanu izdanju još u postupku.
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
nije evidentirano
nije evidentirano
Podaci o skupu
Computational Modeling of Lexical Acquisition
predavanje
25.08.2005-28.08.2005
Split, Hrvatska