Pregled bibliografske jedinice broj: 314312
N-Grams and Morphological Normalization in Text Classification: A Comparison on a Croatian-English Parallel Corpus
N-Grams and Morphological Normalization in Text Classification: A Comparison on a Croatian-English Parallel Corpus // Lecture notes in computer science, 4874 (2007), 671-682 doi:10.1007/978-3-540-77002-2_56 (međunarodna recenzija, članak, znanstveni)
CROSBI ID: 314312 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
N-Grams and Morphological Normalization in Text Classification: A Comparison on a Croatian-English Parallel Corpus
Autori
Šilić, Artur ; Chauchat, Jean-Hugues ; Dalbelo Bašić, Bojana ; Morin, Annie
Izvornik
Lecture notes in computer science (0302-9743) 4874
(2007);
671-682
Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni
Ključne riječi
text classification ; morphological normalization ; stemming ; n-grams ; SVM ; text representation ; performance
Sažetak
In this paper we compare n-grams and morphological normalization, two inherently different text-preprocessing methods, used for text classification on a Croatian-English parallel corpus. Our approach to comparing different text preprocessing techniques is based on measuring computational performance (execution time and memory consumption), as well as classification performance. We show that although n-grams achieve classifier performance comparable to traditional word-based feature extraction and can act as a substitute for morphological normalization, they are computationally much more demanding.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo, Informacijske i komunikacijske znanosti
POVEZANOST RADA
Projekti:
MZO-ZP-036-1300646-1986 - Otkrivanje znanja u tekstnim podacima (Dalbelo-Bašić, Bojana, MZO ) ( CroRIS)
MZOS-098-0982560-2563 - Algoritmi strojnog učenja i njihova primjena (Gamberger, Dragan, MZOS ) ( CroRIS)
Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb,
Institut "Ruđer Bošković", Zagreb
Citiraj ovu publikaciju:
Časopis indeksira:
- Scopus
Uključenost u ostale bibliografske baze podataka::
- Compu-Math Citation Index
- INSPEC