Napredna pretraga

Pregled bibliografske jedinice broj: 533269

Distance Measures and Machine Learning Approaches for Codon Usage Analyses


Supek, Fran; Šmuc, Tomislav
Distance Measures and Machine Learning Approaches for Codon Usage Analyses // Codon Evolution - Mechanisms and Models / Cannarozzi, Gina ; Schneider, Adrian (ur.).
Oxford: Oxford University Press, 2011. str. 229-244


Naslov
Distance Measures and Machine Learning Approaches for Codon Usage Analyses

Autori
Supek, Fran ; Šmuc, Tomislav

Vrsta, podvrsta i kategorija rada
Poglavlja u knjigama, pregledni

Knjiga
Codon Evolution - Mechanisms and Models

Urednik/ci
Cannarozzi, Gina ; Schneider, Adrian

Izdavač
Oxford University Press

Grad
Oxford

Godina
2011

Raspon stranica
229-244

ISBN
9780199601665

Ključne riječi
Codon bias, supervised machine learning, translational selection, highly expressed genes, Random Forest

Sažetak
Unequal use of synonymous codons is a widespread phenomenon caused largely by directional mutation pressures, but also by natural selection for speed and/or accuracy of protein translation. Much effort was dedicated to investigate whether this 'translational selection' had an influence on codon choice of highly expressed genes in various genomes. In such analyses genes are typically represented as vectors of codon frequencies, and the data analyzed using multivariate techniques, commonly either (a) dimensionality reduction, e.g. correspondence analysis, or (b) distance measures in the codon frequency space, such as the codon adaptation index (CAI). Such representations of data can be challenging as genes are too short to allow precise estimation of codon frequencies, introducing noise and consequently leading to serious artifacts in some commonly used methods. A supervised machine learning approach, as embodied in the use of a classifier, provides an alternative more robust to noise and also more sensitive in detecting codon biases. We describe a Random Forest-based computational framework that enables control over confounding factors (here, the background nucleotide substitution patterns) while reliably detecting translational selection, demonstrated on a large set of prokaryotic genomes.

Izvorni jezik
Engleski

Znanstvena područja
Biologija, Računarstvo



POVEZANOST RADA


Projekt / tema
098-0000000-3168 - Strojno učenje prediktivnih modela u računalnoj biologiji (Tomislav Šmuc, )

Ustanove
Institut "Ruđer Bošković", Zagreb