Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 584176

Evaluation of Classification Algorithms and Features for Collocation Extraction in Croatian


Karan, Mladen; Šnajder, Jan; Dalbelo Bašić, Bojana.
Evaluation of Classification Algorithms and Features for Collocation Extraction in Croatian // Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12) / Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis (ur.).
Istanbul, Turska: European Language Resources Association (ELRA), 2012. str. 657-662 (poster, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)


CROSBI ID: 584176 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Evaluation of Classification Algorithms and Features for Collocation Extraction in Croatian

Autori
Karan, Mladen ; Šnajder, Jan ; Dalbelo Bašić, Bojana.

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12) / Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis - : European Language Resources Association (ELRA), 2012, 657-662

ISBN
978-2-9517408-7-7

Skup
Eight International Conference on Language Resources and Evaluation (LREC'12)

Mjesto i datum
Istanbul, Turska, 23.05.2012. - 25.05.2012

Vrsta sudjelovanja
Poster

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
collocation extraction ; feature subset selection ; Croatian language

Sažetak
Collocations can be defined as words that occur together significantly more often than it would be expected by chance. Many natural language processing applications such as natural language generation, word sense disambiguation and machine translation can benefit from having access to information about collocated words. We approach collocation extraction as a classification problem where the task is to classify a given n-gram as either a collocation (positive) or a non- collocation (negative). Among the features used are word frequencies, classical association measures (Dice, PMI, chi2), and POS tags. In addition, semantic word relatedness modeled by latent semantic analysis is also included. We apply wrapper feature subset selection to determine the best set of features. Performance of various classification algorithms is tested. Experiments are conducted on a manually annotated set of bigrams and trigrams sampled from a Croatian newspaper corpus. Best results obtained are 79.8 F1 measure for bigrams and 67.5 F1 measure for trigrams. The best classifier for bigrams was SVM, while for trigrams the decision tree gave the best performance. Features which contributed the most to overall performance were PMI, semantic relatedness, and POS information.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo



POVEZANOST RADA


Projekti:
036-1300646-1986 - Otkrivanje znanja u tekstnim podacima (Dalbelo-Bašić, Bojana, MZO ) ( CroRIS)

Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb

Profili:

Avatar Url Jan Šnajder (autor)

Avatar Url Bojana Dalbelo Bašić (autor)

Avatar Url Mladen Karan (autor)

Poveznice na cjeloviti tekst rada:

Pristup cjelovitom tekstu rada lrec.elra.info

Citiraj ovu publikaciju:

Karan, Mladen; Šnajder, Jan; Dalbelo Bašić, Bojana.
Evaluation of Classification Algorithms and Features for Collocation Extraction in Croatian // Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12) / Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis (ur.).
Istanbul, Turska: European Language Resources Association (ELRA), 2012. str. 657-662 (poster, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
Karan, M., Šnajder, J. & Dalbelo Bašić, B. (2012) Evaluation of Classification Algorithms and Features for Collocation Extraction in Croatian. U: Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis (ur.)Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12).
@article{article, author = {Karan, Mladen and \v{S}najder, Jan and Dalbelo Ba\v{s}i\'{c}, Bojana.}, year = {2012}, pages = {657-662}, keywords = {collocation extraction, feature subset selection, Croatian language}, isbn = {978-2-9517408-7-7}, title = {Evaluation of Classification Algorithms and Features for Collocation Extraction in Croatian}, keyword = {collocation extraction, feature subset selection, Croatian language}, publisher = {European Language Resources Association (ELRA)}, publisherplace = {Istanbul, Turska} }
@article{article, author = {Karan, Mladen and \v{S}najder, Jan and Dalbelo Ba\v{s}i\'{c}, Bojana.}, year = {2012}, pages = {657-662}, keywords = {collocation extraction, feature subset selection, Croatian language}, isbn = {978-2-9517408-7-7}, title = {Evaluation of Classification Algorithms and Features for Collocation Extraction in Croatian}, keyword = {collocation extraction, feature subset selection, Croatian language}, publisher = {European Language Resources Association (ELRA)}, publisherplace = {Istanbul, Turska} }




Contrast
Increase Font
Decrease Font
Dyslexic Font