Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi !

Problems of free text searching in flective languages, The Third International Conference: Information Technology and Journalism - Interactive publishing in Central Europe, Dubrovnik, Inter University Center, 25-29 May 1998 (CROSBI ID 752447)

Druge vrste radova | ostalo

Boras, Damir ; Lauc, Tomislava ; Ristov Strahil Problems of free text searching in flective languages, The Third International Conference: Information Technology and Journalism - Interactive publishing in Central Europe, Dubrovnik, Inter University Center, 25-29 May 1998 // Problems of free text searching in flective languages, The Third International Conference: Informati. 1998.

Podaci o odgovornosti

Boras, Damir ; Lauc, Tomislava ; Ristov Strahil

engleski

Problems of free text searching in flective languages, The Third International Conference: Information Technology and Journalism - Interactive publishing in Central Europe, Dubrovnik, Inter University Center, 25-29 May 1998

There are three basic problems in preparation of natural language text for information retrieval: 1. text segmentation into smaller units (discourses or sentences); 2. word recognition; and 3. index preparation, which should be sufficiently compact and in the same time with a very fast access time. All these seemingly trivial tasks can be extremely complex when they deal with free word order (or flective) language such as Croatian. The Croatian Language has many specific features that make it impossible to use English based algorithms in the processing of Croatian texts. Since it is a free-order language it is very difficult to determine which elements of sentence or discourse are connected to each other. To solve this problem the Croatian text segmentation model was designed. From the other side, in flective languages every word has several word forms and it is not always obvious to which basic word (or lemma) belongs every token (or word-form). In written Croatian language, nouns can have up to ten, and verbs up to twenty different forms. To solve word-form recognition and lemmatization problems several tools have been designed: lexical data base for standard Croatian, corpus-based Croatian word tagging system, and the robust Croatian proper name recognition system. To prepare sufficiently compact but fast searchable indexes the original LZ (Lempel and Ziv) compression method on static tries for searchable data has been developed, which proved viable for any huge sets of static natural language data of any sort.

free text searching; flective languages

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o izdanju

Problems of free text searching in flective languages, The Third International Conference: Informati

1998.

nije evidentirano

objavljeno

Povezanost rada

Informacijske i komunikacijske znanosti