Pregled bibliografske jedinice broj: 1109269
Deep learning approach to determining the type of long reads
Deep learning approach to determining the type of long reads // International Conference on Intelligent Systems for Molecular Biology 2020
online; konferencija, 2020. (poster, međunarodna recenzija, neobjavljeni rad, ostalo)
CROSBI ID: 1109269 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Deep learning approach to determining the type of
long reads
Autori
Vrček, Lovro ; Huang, Megan Hong Hui ; Vaser, Robert ; Šikić, Mile
Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, neobjavljeni rad, ostalo
Skup
International Conference on Intelligent Systems for Molecular Biology 2020
Mjesto i datum
Online; konferencija, 13.07.2020. - 16.07.2020
Vrsta sudjelovanja
Poster
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
De novo assembly ; Deep learning ; Graph simplification ; Read classification
Sažetak
Single and metagenome de novo assembly of long reads is still one of the most difficult problems in bioinformatics. Often used paradigm, called Overlap-Layout-Consensus, aims at finding a Hamiltonian path through an assembly graph obtained from overlapping reads in a sample. However, these graphs can be extremely complex due to repetitive regions in genomes and sequencing artifacts such as chimeric reads, which lead to higher fragmentation of the assembly genomes. A popular approach for tackling this problem is based on dividing reads into three categories and processing them appropriately. These three categories of reads are regular, repetitive, and chimeric. A drawback of read classification with heuristic algorithms in existing assemblers is a manual selection of parameters based on just several genomes. In this work, we propose a deep learning approach for classification of reads based on their pile-o-grams, plots of coverage versus base index. The model was trained on a hand-labeled dataset consisting of pile-o-gram images from multiple bacteria, and tested on a different bacteria species not included in the training set. With such a setup, and with classes being balanced, an accuracy of 93% was achieved which opens the possibility of creating more accurate and less contiguous assemblies.
Izvorni jezik
Engleski
Znanstvena područja
Biologija, Računarstvo
POVEZANOST RADA
Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb