Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi

SFQ: Constructing and Querying a Succinct Representation of FASTQ Files (CROSBI ID 312792)

Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija

Bakarić, Robert ; Korenčić, Damir ; Hršak, Dalibor ; Ristov, Strahil SFQ: Constructing and Querying a Succinct Representation of FASTQ Files // Electronics (Basel), 11 (2022), 11; 1783, 12. doi: 10.3390/electronics11111783

Podaci o odgovornosti

Bakarić, Robert ; Korenčić, Damir ; Hršak, Dalibor ; Ristov, Strahil

engleski

SFQ: Constructing and Querying a Succinct Representation of FASTQ Files

A large and ever increasing quantity of high throughput sequencing (HTS) data is stored in FASTQ files. Various methods for data compression are used to mitigate the storage and transmission costs, from the still prevalent general purpose Gzip to state-of-the-art specialized methods. However, all of the existing methods for FASTQ file compression require the decompression stage before the HTS data can be used. This is particularly costly with the random access to specific records in FASTQ files. We propose the sFASTQ format, a succinct representation of FASTQ files that can be used without decompression (i.e., the records can be retrieved and listed online), and that supports random access to individual records. The sFASTQ format can be searched on the disk, which eliminates the need for any additional memory resources. The searchable sFASTQ archive is of comparable size to the corresponding Gzip file. sFASTQ format outputs (interleaved) FASTQ records to the STDOUT stream. We provide SFQ, a software for the construction and usage of the sFASTQ format that supports variable length reads, pairing of records, and both lossless and lossy compression of quality scores.

bioinformatics ; FASTQ data compression ; random access

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o izdanju

11 (11)

2022.

1783

12

objavljeno

2079-9292

10.3390/electronics11111783

Povezanost rada

Biologija, Računarstvo, Temeljne tehničke znanosti

Poveznice
Indeksiranost