Pregled bibliografske jedinice broj: 296981
Approximate Representation of Textual Documents in the Concept Space
Approximate Representation of Textual Documents in the Concept Space // Informatica, 31 (2007), 1; 21-27 (međunarodna recenzija, članak, znanstveni)
CROSBI ID: 296981 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Approximate Representation of Textual Documents in the Concept Space
Autori
Dobša, Jasminka ; Dalbelo-Bašić, Bojana
Izvornik
Informatica (0350-5596) 31
(2007), 1;
21-27
Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni
Ključne riječi
dimensionality reduction; concept decomposition; information retrieval
Sažetak
In this paper we deal with the problem of addition of new documents in collection when documents are represented in lower dimensional space by concept indexing. Concept indexing (CI) is a method of feature construction that is relying on concept decomposition of term-document matrix. By using CI original representations of documents are projected on the space spread by centroids of clusters, which are called concept vectors. This problem is especially interesting for application on World Wide Web. Proposed methods are tested for the task of information retrieval. Vectors on which the projection is done in the process of dimension reduction are constructed on the basis of representations of all documents in the collection, and computation of the new representations in the space of reduced dimension demands recomputation of concept decomposition. The solution to this problem is the development of methods which will give approximate representation of newly added documents in the space of reduced dimension. In the paper are introduced two methods for addition of new documents in the space of reduced dimension. In the first method there no addition of new index terms and added documents are represented by existing list of index terms, while in the second method list of index terms is extended and representations of documents and concept vectors are extended in dimensions of newly added terms. It is shown that representation of documents by extended list of index terms does not improve performance of information retrieval significantly.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo
POVEZANOST RADA
Projekti:
016-0161741-1739 - Razvoj informacijske infrastrukture i deduktivnih mehanizama Semantičkog Weba (Čubrilo, Mirko, MZOS ) ( CroRIS)
016-0361935-1728 - Semantičko modeliranje višeagentnih sustava (Maleković, Mirko, MZOS ) ( CroRIS)
036-1300646-1986 - Otkrivanje znanja u tekstnim podacima (Dalbelo-Bašić, Bojana, MZO ) ( CroRIS)
Ustanove:
Fakultet organizacije i informatike, Varaždin,
Fakultet elektrotehnike i računarstva, Zagreb
Citiraj ovu publikaciju:
Časopis indeksira:
- Web of Science Core Collection (WoSCC)
- Emerging Sources Citation Index (ESCI)
- Scopus
Uključenost u ostale bibliografske baze podataka::
- The INSPEC Science Abstracts series
- Linguistics and Language Behavior Abstracts
- Mathematical Reviews
- Scopus
- Compendex
- Computer & Information Systems Abstracts