Algorithms for de novo assembly of large genomes (CROSBI ID 432199)
Ocjenski rad | doktorska disertacija
Podaci o odgovornosti
Vaser, Robert
Šikić, Mile
engleski
Algorithms for de novo assembly of large genomes
The inability of DNA sequencing technologies to interpret entire molecules led to the development of methods that connect the obtained short fragments back together in a puzzle-like process. They are called assemblers and their design is guided with the notion that similar fragments originate from the same region in the genome. That is often annulled due to sequencing errors and repetitive nature of the genome. Short fragments of first two generations of sequencing are incapable of spanning moderately long repetitive regions and thus hinder a complete assembly. The advent of new sequencing approaches, namely Pacific Biosciences and Oxford Nanopore Technologies, pushed the limit on the fragment lengths at a cost of higher error rates, but still facilitated the assembly problem considerably. First assembly attempts used various types of error correction approaches prior the assembly with existing tools at that time. Although, several long read based assemblers have been proposed in the past years, they demand significant amounts of computational resources. The focus of this research is development of memory efficient and scalable algorithms for de novo assembly of large genomes using third generation of sequencing data without error correction of input sequences. In the scope of the thesis we implemented three novel tools for genome assembly: a memory friendly layout module called Rala, which builds the assembly graph from preprocessed sequences and resolves junctions in graph with the help of force directed placement ; a fast and accurate consensus module called Racon based on vectorized partial order alignment ; and the complete de novo assembler called Raven, which competes with state-of-the-art assemblers both in quality and resource management.
de novo, assembly, long reads, PacBio, Oxford Nanopore, pile-o-gram, force directed layout, partial order alignment, vectorization
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o izdanju
83
19.12.2019.
obranjeno
Podaci o ustanovi koja je dodijelila akademski stupanj
Fakultet elektrotehnike i računarstva
Zagreb