Pregled bibliografske jedinice broj: 915182
Algorithms for Layout Phase of De Novo Genome Assembly
Algorithms for Layout Phase of De Novo Genome Assembly // Second International Workshop on Data Science
Zagreb, Hrvatska, 2017. str. 86-87 (poster, međunarodna recenzija, prošireni sažetak, znanstveni)
CROSBI ID: 915182 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Algorithms for Layout Phase of De Novo Genome Assembly
Autori
Vaser, Robert ; Šikić, Mile
Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, prošireni sažetak, znanstveni
Izvornik
Second International Workshop on Data Science
/ - , 2017, 86-87
Skup
Second International Workshop on Data Science
Mjesto i datum
Zagreb, Hrvatska, 30.11.2017
Vrsta sudjelovanja
Poster
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
de novo assembly, layout phase
Sažetak
DNA sequencing and assembly are one of the crucial parts of biological and medical research. Third generation sequencing technologies facilitated more con- tiguous assemblies thanks to the increase of read fragment lengths. Although the accuracy of such fragments is much lower than that of predecessor technologies, graph-based algorithms are able to assemble small to medium size genomes even without error correction, among which the overlap-layout-consensus paradigm is most notable. 86 Thursday, 11:30-11:55 Multidisciplinary Data Intensive Applications Here we present a standalone layout model intended for assembly of raw reads produced by third generation of sequencing platforms called Rala. It consists of two parts, fragment preprocessing inspired by the assembler HINGE [1] and assembly graph simplifications as described in the assembler Miniasm [2]. In preprocessing, pairwise overlaps between fragments are used to generate coverage graphs which enable distinction between fragments. Fragments which coverage graphs have sharp dips or peaks are chimeric, meaning they consist of two distinct parts of the genome, and are removed from the fragment set. Hills in coverage graphs indicate repetitive genomic regions and show whether a fragment bridges that regions. Overlaps between fragments that do not bridge repeats are removed as well. Afterwards, the assembly graph is build and simplified with transitive reduction, trimming, bubble popping and a heuristic which untangles leftover junctions in the graph. The whole implementation is publicly available at https: //github.com/rvaser/rala under the MIT licence. As a side result, we show that the percentage of chimeric reads produced by either the Pacific Biosciences or Oxford Nanopore Technologies platforms is correlated with the fragment length.
Izvorni jezik
Engleski
Znanstvena područja
Biologija, Računarstvo
POVEZANOST RADA
Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb,
Fakultet elektrotehnike, računarstva i informacijskih tehnologija Osijek