Pregled bibliografske jedinice broj: 141082
A Note on Indexing DNA and Protein Sequences
A Note on Indexing DNA and Protein Sequences // Proceedings 6th Intl. Multi-Conference Information Society IS 2003, Vol A, Intelligent and Computer Systems / Bohanec, Marko ; Filipič, Bogdan ; Gams, Matjaž (ur.).
Ljubljana: Institut Jožef Stefan, 2003. str. 121-126 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 141082 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
A Note on Indexing DNA and Protein Sequences
Autori
Ristov, Strahil
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Proceedings 6th Intl. Multi-Conference Information Society IS 2003, Vol A, Intelligent and Computer Systems
/ Bohanec, Marko ; Filipič, Bogdan ; Gams, Matjaž - Ljubljana : Institut Jožef Stefan, 2003, 121-126
Skup
6th International Multi-Conference Information Society IS 2003, Intelligent and Computer Systems
Mjesto i datum
Ljubljana, Slovenija, 13.10.2003. - 17.10.2003
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
DNA indexing; protein sequence indexing; suffix trees; LZ trie; sequence matching; truncated suffix tree; suffix sequoia
Sažetak
Many applications in computational biology rely on indexing biological sequences. Indexing the sequences greatly reduces the time complexity of a search. However, good index structures, such as suffix trees, require inordinate amounts of space. We describe a work in progress on a new approach to indexing using truncated suffix tree implemented with a LZ compressed trie. The index would require about 4 bytes per symbol for the largest collection of protein sequences (over 450 M amino acids) and about 5 bytes for the largest collection of DNA sequences (over 20 G bases).
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo
POVEZANOST RADA