Pregled bibliografske jedinice broj: 915070
De Novo Assembly using Unsupervised Read Categorization
De Novo Assembly using Unsupervised Read Categorization // Second International Workshop on Data Science / Lončarić, Sven ; Šmuc, Tomislav (ur.).
Zagreb, 2017. str. 69-72 (poster, međunarodna recenzija, prošireni sažetak, znanstveni)
CROSBI ID: 915070 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
De Novo Assembly using Unsupervised Read Categorization
Autori
Tomljanović, Jan ; Šebrek, Tomislav ; Šikić, Mile
Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, prošireni sažetak, znanstveni
Izvornik
Second International Workshop on Data Science
/ Lončarić, Sven ; Šmuc, Tomislav - Zagreb, 2017, 69-72
Skup
Second International Workshop on Data Science
Mjesto i datum
Zagreb, Hrvatska, 30.11.2017
Vrsta sudjelovanja
Poster
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
Deep-learning ; Unsupervised learning ; De novo assembly ; Chimeric read ; Repeat read.
Sažetak
In this work, we present a novel method for de novo genome assembly which is based on detection of chimeric and repeat reads. Using this information, we can facilitate the detection of unique sequences which results in more contigu- ous final sequences. We showed that read types can be separated by transforming a coverage graph for each read into 1D signal. We found that signals for repeat and chimeric reads differ significantly from signals for regular reads. Because manual determination of correct read types is a tedious and time-consuming job, we chose unsupervised learning. For feature extraction, we applied and compared variational and denoising autoencoders. Clustering was performed by K-means algorithm. We tested the method on four bacterial genomes sequenced by Pacific Biosciences devices. The achieved results show that using labelled read types can significant improve the contiguity of the assembled final sequence.
Izvorni jezik
Engleski
Znanstvena područja
Biologija, Računarstvo