Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 314312

N-Grams and Morphological Normalization in Text Classification: A Comparison on a Croatian-English Parallel Corpus


Šilić, Artur; Chauchat, Jean-Hugues; Dalbelo Bašić, Bojana; Morin, Annie
N-Grams and Morphological Normalization in Text Classification: A Comparison on a Croatian-English Parallel Corpus // Lecture notes in computer science, 4874 (2007), 671-682 doi:10.1007/978-3-540-77002-2_56 (međunarodna recenzija, članak, znanstveni)


CROSBI ID: 314312 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
N-Grams and Morphological Normalization in Text Classification: A Comparison on a Croatian-English Parallel Corpus

Autori
Šilić, Artur ; Chauchat, Jean-Hugues ; Dalbelo Bašić, Bojana ; Morin, Annie

Izvornik
Lecture notes in computer science (0302-9743) 4874 (2007); 671-682

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
text classification ; morphological normalization ; stemming ; n-grams ; SVM ; text representation ; performance

Sažetak
In this paper we compare n-grams and morphological normalization, two inherently different text-preprocessing methods, used for text classification on a Croatian-English parallel corpus. Our approach to comparing different text preprocessing techniques is based on measuring computational performance (execution time and memory consumption), as well as classification performance. We show that although n-grams achieve classifier performance comparable to traditional word-based feature extraction and can act as a substitute for morphological normalization, they are computationally much more demanding.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo, Informacijske i komunikacijske znanosti



POVEZANOST RADA


Projekti:
MZO-ZP-036-1300646-1986 - Otkrivanje znanja u tekstnim podacima (Dalbelo-Bašić, Bojana, MZO ) ( CroRIS)
MZOS-098-0982560-2563 - Algoritmi strojnog učenja i njihova primjena (Gamberger, Dragan, MZOS ) ( CroRIS)

Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb,
Institut "Ruđer Bošković", Zagreb

Profili:

Avatar Url Bojana Dalbelo Bašić (autor)

Avatar Url Artur Šilić (autor)

Poveznice na cjeloviti tekst rada:

Pristup cjelovitom tekstu rada doi link.springer.com

Citiraj ovu publikaciju:

Šilić, Artur; Chauchat, Jean-Hugues; Dalbelo Bašić, Bojana; Morin, Annie
N-Grams and Morphological Normalization in Text Classification: A Comparison on a Croatian-English Parallel Corpus // Lecture notes in computer science, 4874 (2007), 671-682 doi:10.1007/978-3-540-77002-2_56 (međunarodna recenzija, članak, znanstveni)
Šilić, A., Chauchat, J., Dalbelo Bašić, B. & Morin, A. (2007) N-Grams and Morphological Normalization in Text Classification: A Comparison on a Croatian-English Parallel Corpus. Lecture notes in computer science, 4874, 671-682 doi:10.1007/978-3-540-77002-2_56.
@article{article, author = {\v{S}ili\'{c}, Artur and Chauchat, Jean-Hugues and Dalbelo Ba\v{s}i\'{c}, Bojana and Morin, Annie}, year = {2007}, pages = {671-682}, DOI = {10.1007/978-3-540-77002-2\_56}, keywords = {text classification, morphological normalization, stemming, n-grams, SVM, text representation, performance}, journal = {Lecture notes in computer science}, doi = {10.1007/978-3-540-77002-2\_56}, volume = {4874}, issn = {0302-9743}, title = {N-Grams and Morphological Normalization in Text Classification: A Comparison on a Croatian-English Parallel Corpus}, keyword = {text classification, morphological normalization, stemming, n-grams, SVM, text representation, performance} }
@article{article, author = {\v{S}ili\'{c}, Artur and Chauchat, Jean-Hugues and Dalbelo Ba\v{s}i\'{c}, Bojana and Morin, Annie}, year = {2007}, pages = {671-682}, DOI = {10.1007/978-3-540-77002-2\_56}, keywords = {text classification, morphological normalization, stemming, n-grams, SVM, text representation, performance}, journal = {Lecture notes in computer science}, doi = {10.1007/978-3-540-77002-2\_56}, volume = {4874}, issn = {0302-9743}, title = {N-Grams and Morphological Normalization in Text Classification: A Comparison on a Croatian-English Parallel Corpus}, keyword = {text classification, morphological normalization, stemming, n-grams, SVM, text representation, performance} }

Časopis indeksira:


  • Scopus


Uključenost u ostale bibliografske baze podataka::


  • Compu-Math Citation Index
  • INSPEC


Citati:





    Contrast
    Increase Font
    Decrease Font
    Dyslexic Font