Exploring String and Word Kernels on Croatian-English Parallel Corpus

Jonke, Zeno; Šilić, Artur; Dalbelo Bašić, Bojana

Pregled bibliografske jedinice broj: 396419

Exploring String and Word Kernels on Croatian-English Parallel Corpus

Jonke, Zeno; Šilić, Artur; Dalbelo Bašić, Bojana

Exploring String and Word Kernels on Croatian-English Parallel Corpus // Intelligent Systems MIPRO 2009
Rijeka: Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO, 2009. str. 308-311 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)

CROSBI ID: 396419 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Exploring String and Word Kernels on Croatian-English Parallel Corpus

Autori
Jonke, Zeno ; Šilić, Artur ; Dalbelo Bašić, Bojana

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Intelligent Systems MIPRO 2009 / - Rijeka : Hrvatska udruga za informacijsku i komunikacijsku tehnologiju, elektroniku i mikroelektroniku - MIPRO, 2009, 308-311

Skup
International Conference MIPRO 2009

Mjesto i datum
Opatija, Hrvatska, 25.05.2009. - 29.05.2009

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
word kernls; string kernels; text classification

Sažetak
In this paper we investigate classification performance of kernels based document representations, as well as the influence of kernel parameters for text classification in two morphologically different languages. We explore and compare two kernel functions that work at different levels of a sentence. The first is the Gap weighted kernel, a member of the String kernels that operates at the character level and thus compares text documents by subsequences of characters. This removes the need for stemming or lemmatisation, since it captures the stems of the words automatically, which is very important in situations when tools for stemming or lemmatisation are not available. The second method is the Word sequence kernel, an extension of the String kernels that works at the level of the word. This approach provides a more natural representation of the text and has the advantage of reducing document representation, which in turn reduces computation time. These two methods are compared by exploring theirs parameters dependency and by measuring their classification performance for the Croatian-English parallel corpus.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo

POVEZANOST RADA

Projekti:
036-1300646-1986 - Otkrivanje znanja u tekstnim podacima (Dalbelo-Bašić, Bojana, MZO ) ( CroRIS)

Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb

Profili:

Bojana Dalbelo Bašić (autor)

Artur Šilić (autor)

CROSBI Hrvatska znanstvena bibliografija

Pregled bibliografske jedinice broj: 396419

Exploring String and Word Kernels on Croatian-English Parallel Corpus

Citiraj ovu publikaciju:

Pregled bibliografske jedinice broj: 396419

Exploring String and Word Kernels on Croatian-English Parallel Corpus

Citiraj ovu publikaciju:

Podijeli: