Pregled bibliografske jedinice broj: 174659
Web indexing and search with local language support
Web indexing and search with local language support // Proceedings of SoftCOM 2003 / D. Begušić, N. Rožić (ur.).
Split: Fakultet elektrotehnike, strojarstva i brodogradnje Sveučilišta u Splitu, 2003. str. 488-492 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), stručni)
CROSBI ID: 174659 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Web indexing and search with local language support
Autori
Krstinić, Damir ; Slapničar, Ivan
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), stručni
Izvornik
Proceedings of SoftCOM 2003
/ D. Begušić, N. Rožić - Split : Fakultet elektrotehnike, strojarstva i brodogradnje Sveučilišta u Splitu, 2003, 488-492
Skup
SoftCOM 2003.
Mjesto i datum
Ancona, Italija; Venecija, Italija; Split, Hrvatska, 07.10.2003. - 10.10.2003
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
WWW; Internet; information retrieval; text search; vector spaces; latent semantic indexing; LSI; singular value decomposition; SVD; grammar; web spider
Sažetak
Web search is becoming essential for every day life, where major need arises for extracting relevant knowledge from enormous amounts of the available data. In a modern information retrieval systems, data is modeled as a term-by-document matrix. User query is represented as a vector and database search becomes a simple vector operation. The Latent Semantic Indexing (LSI) method reduces the size of term by document matrix and improves the performance of information retrieval system. Great majority of these systems are based on the English language. Although these systems are applicable to documents in other languages, they can suffer from incomplete terms recognition. We focus on languages with a complex set of grammar rules where improvement can be achieved by giving the indexing system basic knowledge of the language, and ability to recognize different forms of the same word. Using this technique, original matrix can be reduced by order of magnitude and important term-document connections strengthened. We are developing web indexing engine with local language support using Ispell dictionary files. As part of this effort, Croatian language dictionary files have been developed.
Izvorni jezik
Engleski
Znanstvena područja
Matematika