Deep learning approach to determining the type of long reads

Vrček, Lovro; Huang, Megan Hong Hui; Vaser, Robert; Šikić, Mile

Pregled bibliografske jedinice broj: 1109269

Deep learning approach to determining the type of long reads

Vrček, Lovro; Huang, Megan Hong Hui; Vaser, Robert; Šikić, Mile

Deep learning approach to determining the type of long reads // International Conference on Intelligent Systems for Molecular Biology 2020
online; konferencija, 2020. (poster, međunarodna recenzija, neobjavljeni rad, ostalo)

CROSBI ID: 1109269 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Deep learning approach to determining the type of long reads

Autori
Vrček, Lovro ; Huang, Megan Hong Hui ; Vaser, Robert ; Šikić, Mile

Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, neobjavljeni rad, ostalo

Skup
International Conference on Intelligent Systems for Molecular Biology 2020

Mjesto i datum
Online; konferencija, 13.07.2020. - 16.07.2020

Vrsta sudjelovanja
Poster

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
De novo assembly ; Deep learning ; Graph simplification ; Read classification

Sažetak
Single and metagenome de novo assembly of long reads is still one of the most difficult problems in bioinformatics. Often used paradigm, called Overlap-Layout-Consensus, aims at finding a Hamiltonian path through an assembly graph obtained from overlapping reads in a sample. However, these graphs can be extremely complex due to repetitive regions in genomes and sequencing artifacts such as chimeric reads, which lead to higher fragmentation of the assembly genomes. A popular approach for tackling this problem is based on dividing reads into three categories and processing them appropriately. These three categories of reads are regular, repetitive, and chimeric. A drawback of read classification with heuristic algorithms in existing assemblers is a manual selection of parameters based on just several genomes. In this work, we propose a deep learning approach for classification of reads based on their pile-o-grams, plots of coverage versus base index. The model was trained on a hand-labeled dataset consisting of pile-o-gram images from multiple bacteria, and tested on a different bacteria species not included in the training set. With such a setup, and with classes being balanced, an accuracy of 93% was achieved which opens the possibility of creating more accurate and less contiguous assemblies.

Izvorni jezik
Engleski

Znanstvena područja
Biologija, Računarstvo

POVEZANOST RADA

Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb

Profili:

Mile Šikić (autor)

Robert Vaser (autor)

Lovro Vrček (autor)

CROSBI Hrvatska znanstvena bibliografija

Pregled bibliografske jedinice broj: 1109269

Deep learning approach to determining the type of long reads

Citiraj ovu publikaciju:

Pregled bibliografske jedinice broj: 1109269

Deep learning approach to determining the type of long reads

Citiraj ovu publikaciju:

Podijeli: