Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi

N-Grams and Morphological Normalization in Text Classification: A Comparison on a Croatian-English Parallel Corpus (CROSBI ID 134996)

Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija

Šilić, Artur ; Chauchat, Jean-Hugues ; Dalbelo Bašić, Bojana ; Morin, Annie N-Grams and Morphological Normalization in Text Classification: A Comparison on a Croatian-English Parallel Corpus // Lecture notes in computer science, 4874 (2007), 671-682. doi: 10.1007/978-3-540-77002-2_56

Podaci o odgovornosti

Šilić, Artur ; Chauchat, Jean-Hugues ; Dalbelo Bašić, Bojana ; Morin, Annie

engleski

N-Grams and Morphological Normalization in Text Classification: A Comparison on a Croatian-English Parallel Corpus

In this paper we compare n-grams and morphological normalization, two inherently different text-preprocessing methods, used for text classification on a Croatian-English parallel corpus. Our approach to comparing different text preprocessing techniques is based on measuring computational performance (execution time and memory consumption), as well as classification performance. We show that although n-grams achieve classifier performance comparable to traditional word-based feature extraction and can act as a substitute for morphological normalization, they are computationally much more demanding.

text classification ; morphological normalization ; stemming ; n-grams ; SVM ; text representation ; performance

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o izdanju

4874

2007.

671-682

objavljeno

0302-9743

1611-3349

10.1007/978-3-540-77002-2_56

Povezanost rada

Informacijske i komunikacijske znanosti, Računarstvo

Poveznice
Indeksiranost