Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 1074614

Sentence Retrieval using Stemming and Lemmatization with Different Length of the Queries


Boban, Ivan; Doko, Alen; Gotovac, Sven
Sentence Retrieval using Stemming and Lemmatization with Different Length of the Queries // Advances in Science, Technology and Engineering Systems Journal, 5 (2020), 3; 349-354 doi:10.25046/aj050345 (međunarodna recenzija, članak, znanstveni)


CROSBI ID: 1074614 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Sentence Retrieval using Stemming and Lemmatization with Different Length of the Queries

Autori
Boban, Ivan ; Doko, Alen ; Gotovac, Sven

Izvornik
Advances in Science, Technology and Engineering Systems Journal (2415-6698) 5 (2020), 3; 349-354

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
Sentence retrieval ; TF-ISF ; Data pre-processing ; Stemming ; Lemmatization

Sažetak
In this paper we focus on Sentence retrieval which is similar to Document retrieval but with a smaller unit of retrieval. Using data pre- processing in document retrieval is generally considered useful. When it comes to sentence retrieval the situation is not that clear. In this paper we use TF-ISF (term frequency – inverse sentence frequency) method for sentence retrieval. As pre-processing steps, we use stop word removal and language modeling techniques: stemming and lemmatization. We also experiment with different query lengths. The results show that data pre-processing with stemming and lemmatization is useful with sentences retrieval as it is with document retrieval. Lemmatization produces better results with longer queries, while stemming shows worse results with longer queries. For the experiment we used data of the Text Retrieval Conference (TREC) novelty tracks.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo



POVEZANOST RADA


Ustanove:
Fakultet elektrotehnike, strojarstva i brodogradnje, Split

Profili:

Avatar Url Sven Gotovac (autor)

Poveznice na cjeloviti tekst rada:

Pristup cjelovitom tekstu rada doi astesj.com

Citiraj ovu publikaciju:

Boban, Ivan; Doko, Alen; Gotovac, Sven
Sentence Retrieval using Stemming and Lemmatization with Different Length of the Queries // Advances in Science, Technology and Engineering Systems Journal, 5 (2020), 3; 349-354 doi:10.25046/aj050345 (međunarodna recenzija, članak, znanstveni)
Boban, I., Doko, A. & Gotovac, S. (2020) Sentence Retrieval using Stemming and Lemmatization with Different Length of the Queries. Advances in Science, Technology and Engineering Systems Journal, 5 (3), 349-354 doi:10.25046/aj050345.
@article{article, author = {Boban, Ivan and Doko, Alen and Gotovac, Sven}, year = {2020}, pages = {349-354}, DOI = {10.25046/aj050345}, keywords = {Sentence retrieval, TF-ISF, Data pre-processing, Stemming, Lemmatization}, journal = {Advances in Science, Technology and Engineering Systems Journal}, doi = {10.25046/aj050345}, volume = {5}, number = {3}, issn = {2415-6698}, title = {Sentence Retrieval using Stemming and Lemmatization with Different Length of the Queries}, keyword = {Sentence retrieval, TF-ISF, Data pre-processing, Stemming, Lemmatization} }
@article{article, author = {Boban, Ivan and Doko, Alen and Gotovac, Sven}, year = {2020}, pages = {349-354}, DOI = {10.25046/aj050345}, keywords = {Sentence retrieval, TF-ISF, Data pre-processing, Stemming, Lemmatization}, journal = {Advances in Science, Technology and Engineering Systems Journal}, doi = {10.25046/aj050345}, volume = {5}, number = {3}, issn = {2415-6698}, title = {Sentence Retrieval using Stemming and Lemmatization with Different Length of the Queries}, keyword = {Sentence retrieval, TF-ISF, Data pre-processing, Stemming, Lemmatization} }

Časopis indeksira:


  • Scopus


Citati:





    Contrast
    Increase Font
    Decrease Font
    Dyslexic Font