Problems of free text searching in flective languages, The Third International Conference: Information Technology and Journalism - Interactive publishing in Central Europe, Dubrovnik, Inter University Center, 25-29 May 1998

Boras, Damir; Lauc, Tomislava; Ristov Strahil

izvor podataka: crosbi !

Problems of free text searching in flective languages, The Third International Conference: Information Technology and Journalism - Interactive publishing in Central Europe, Dubrovnik, Inter University Center, 25-29 May 1998 (CROSBI ID 752447)

Druge vrste radova | ostalo

Boras, Damir ; Lauc, Tomislava ; Ristov Strahil Problems of free text searching in flective languages, The Third International Conference: Information Technology and Journalism - Interactive publishing in Central Europe, Dubrovnik, Inter University Center, 25-29 May 1998 // Problems of free text searching in flective languages, The Third International Conference: Informati. 1998.

Podaci o odgovornosti

Autori

Boras, Damir ; Lauc, Tomislava ; Ristov Strahil

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

Problems of free text searching in flective languages, The Third International Conference: Information Technology and Journalism - Interactive publishing in Central Europe, Dubrovnik, Inter University Center, 25-29 May 1998

Sažetak

There are three basic problems in preparation of natural language text for information retrieval: 1. text segmentation into smaller units (discourses or sentences); 2. word recognition; and 3. index preparation, which should be sufficiently compact and in the same time with a very fast access time. All these seemingly trivial tasks can be extremely complex when they deal with free word order (or flective) language such as Croatian. The Croatian Language has many specific features that make it impossible to use English based algorithms in the processing of Croatian texts. Since it is a free-order language it is very difficult to determine which elements of sentence or discourse are connected to each other. To solve this problem the Croatian text segmentation model was designed. From the other side, in flective languages every word has several word forms and it is not always obvious to which basic word (or lemma) belongs every token (or word-form). In written Croatian language, nouns can have up to ten, and verbs up to twenty different forms. To solve word-form recognition and lemmatization problems several tools have been designed: lexical data base for standard Croatian, corpus-based Croatian word tagging system, and the robust Croatian proper name recognition system. To prepare sufficiently compact but fast searchable indexes the original LZ (Lempel and Ziv) compression method on static tries for searchable data has been developed, which proved viable for any huge sets of static natural language data of any sort.

Ključne riječi

free text searching; flective languages

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o izdanju

Naslov izvornika

Problems of free text searching in flective languages, The Third International Conference: Informati

Godina izdavanja

1998.

Volumen (broj)

nije evidentirano

Status objave rada

objavljeno

Povezanost rada

Povezane osobe

Tomislava Lauc (autor/i)

Strahil Ristov (autor/i)

Damir Boras (autor/i)

Povezane ustanove

Filozofski fakultet u Zagrebu (130) (autorova ustanova)

Područje

Informacijske i komunikacijske znanosti