N-Grams and Morphological Normalization in Text Classification: A Comparison on a Croatian-English Parallel Corpus (CROSBI ID 134996)
Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Šilić, Artur ; Chauchat, Jean-Hugues ; Dalbelo Bašić, Bojana ; Morin, Annie
engleski
N-Grams and Morphological Normalization in Text Classification: A Comparison on a Croatian-English Parallel Corpus
In this paper we compare n-grams and morphological normalization, two inherently different text-preprocessing methods, used for text classification on a Croatian-English parallel corpus. Our approach to comparing different text preprocessing techniques is based on measuring computational performance (execution time and memory consumption), as well as classification performance. We show that although n-grams achieve classifier performance comparable to traditional word-based feature extraction and can act as a substitute for morphological normalization, they are computationally much more demanding.
text classification ; morphological normalization ; stemming ; n-grams ; SVM ; text representation ; performance
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o izdanju
4874
2007.
671-682
objavljeno
0302-9743
1611-3349
10.1007/978-3-540-77002-2_56
Povezanost rada
Informacijske i komunikacijske znanosti, Računarstvo