Pregled bibliografske jedinice broj: 631030
From Short to Long Reads: Benchmarking Assembly Tools
From Short to Long Reads: Benchmarking Assembly Tools // ISMB/ECCB 2013
Berlin, Njemačka, 2013. str. 1-1 (poster, međunarodna recenzija, sažetak, znanstveni)
CROSBI ID: 631030 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
From Short to Long Reads: Benchmarking Assembly Tools
Autori
Sović, Ivan ; Skala, Karolj ; Šikić, Mile
Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, sažetak, znanstveni
Izvornik
ISMB/ECCB 2013
/ - , 2013, 1-1
Skup
ISMB/ECCB 2013
Mjesto i datum
Berlin, Njemačka, 20.07.2013. - 24.07.2013
Vrsta sudjelovanja
Poster
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
DNA ; sequencing ; assembly ; tools ; long read ; benchmark ; N50 ; performan ce
Sažetak
An increasing number of DNA de novo assembly tools are being developed, each claiming to produce better results in some aspect than their competition. It is, however, interesting that not enough attention has been paid to their comparative evaluation. Even in cases where the quality of their results has been tested, it is hard to find information on their execution performance. We designed a benchmarking methodology and applied it to several DNA de novo assembly tools. Unlike other comparative studies, our primary goal was to focus on assemblers’ resource consumption as a function of varying lengths and coverages of input read sequences. Since such study is very time consuming, we have currently performed benchmarking on a limited number of assemblers, and report here the preliminary results. We have defined a collection of 77 datasets of simulated read sequences of E. Coli, designed to cover the space of varying read lengths and coverages. Benchmarking was performed on two de Bruijn graph (DBG) based assemblers, Velvet and SOAPdenovo, and two overlap graph (OG) based assemblers, SGA and Minimus. Preliminary results show that DBG-based assemblers generally perform faster than OG-based ones. Additionally, DBGs memory consumption reaches a plateau at some point. The two tested OGs produce differing memory results, presumably because of different underlying alignment algorithms. However, DBGs seem to produce much lower N50 and maximal contig lengths than OGs, especially for longer reads. We conclude that OG is the approach of preference for the upcoming sequencing technologies that will produce longer reads.
Izvorni jezik
Engleski
Znanstvena područja
Biologija, Računarstvo
POVEZANOST RADA
Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb,
Institut "Ruđer Bošković", Zagreb