Finding Multiword Term Candidates in Croatian

Tadić, Marko; Šojat, Krešimir

Pregled bibliografske jedinice broj: 126566

Finding Multiword Term Candidates in Croatian

Tadić, Marko; Šojat, Krešimir

Finding Multiword Term Candidates in Croatian // Proceedings of Information Extraction for Slavic Languages 2003 Workshop (IESL2003)
Sofija: BAS, 2003. str. 102-107 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)

CROSBI ID: 126566 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Finding Multiword Term Candidates in Croatian

Autori
Tadić, Marko ; Šojat, Krešimir

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of Information Extraction for Slavic Languages 2003 Workshop (IESL2003) / - Sofija : BAS, 2003, 102-107

Skup
Information Extraction for Slavic Languages 2003 Workshop

Mjesto i datum
Borovec, Bugarska, 08.09.2003. - 09.09.2003

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
Croatian Language; multiword terms; term candidates; statistical processing; mutual information

Sažetak
The paper presents the research in the field of statistical processing of a corpus of texts in Croatian with the primary aim of finding statistically significant co-occurrences of n-grams of tokens (digrams , trigrams and tetragrams). The collocations found with this method present the list of candidates for multiword terminological units submitted to terminologists for further processing i.e. manual selecting of the &#8220 ; real terms&#8221 ; . The statistical measure of co-occurrence used is mutual information (MI3) accompanied with linguistic filters: stop-words and POS. The results on non-lemmatized material of a highly inflected lan-guage such as Croatian show that MI measure alone is not sufficient to find satisfactory number of multi-word term candidates. In this case the usage of absolute frequency combined with linguistic filtering techniques gives broader list of candidates for real terms.

Izvorni jezik
Engleski

Znanstvena područja
Filologija

POVEZANOST RADA

Projekti:
0130418

Ustanove:
Filozofski fakultet, Zagreb

Profili:

Marko Tadić (autor)

Krešimir Šojat (autor)

CROSBI Hrvatska znanstvena bibliografija

Pregled bibliografske jedinice broj: 126566

Finding Multiword Term Candidates in Croatian

Citiraj ovu publikaciju:

Pregled bibliografske jedinice broj: 126566

Finding Multiword Term Candidates in Croatian

Citiraj ovu publikaciju:

Podijeli: