A Note on Indexing DNA and Protein Sequences (CROSBI ID 495206)
Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Ristov, Strahil
engleski
A Note on Indexing DNA and Protein Sequences
Many applications in computational biology rely on indexing biological sequences. Indexing the sequences greatly reduces the time complexity of a search. However, good index structures, such as suffix trees, require inordinate amounts of space. We describe a work in progress on a new approach to indexing using truncated suffix tree implemented with a LZ compressed trie. The index would require about 4 bytes per symbol for the largest collection of protein sequences (over 450 M amino acids) and about 5 bytes for the largest collection of DNA sequences (over 20 G bases).
DNA indexing; protein sequence indexing; suffix trees; LZ trie; sequence matching; truncated suffix tree; suffix sequoia
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
121-126-x.
2003.
objavljeno
Podaci o matičnoj publikaciji
Proceedings 6th Intl. Multi-Conference Information Society IS 2003, Vol A, Intelligent and Computer Systems
Bohanec, Marko ; Filipič, Bogdan ; Gams, Matjaž
Ljubljana: Institut Jožef Stefan
Podaci o skupu
6th International Multi-Conference Information Society IS 2003, Intelligent and Computer Systems
predavanje
13.10.2003-17.10.2003
Ljubljana, Slovenija