Pregled bibliografske jedinice broj: 533269
Distance Measures and Machine Learning Approaches for Codon Usage Analyses
Distance Measures and Machine Learning Approaches for Codon Usage Analyses // Codon Evolution - Mechanisms and Models / Cannarozzi, Gina ; Schneider, Adrian (ur.).
Oxford: Oxford University Press, 2011. str. 229-244
CROSBI ID: 533269 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Distance Measures and Machine Learning Approaches for Codon Usage Analyses
Autori
Supek, Fran ; Šmuc, Tomislav
Vrsta, podvrsta i kategorija rada
Poglavlja u knjigama, pregledni
Knjiga
Codon Evolution - Mechanisms and Models
Urednik/ci
Cannarozzi, Gina ; Schneider, Adrian
Izdavač
Oxford University Press
Grad
Oxford
Godina
2011
Raspon stranica
229-244
ISBN
9780199601665
Ključne riječi
codon bias, supervised machine learning, translational selection, highly expressed genes, Random Forest
Sažetak
Unequal use of synonymous codons is a widespread phenomenon caused largely by directional mutation pressures, but also by natural selection for speed and/or accuracy of protein translation. Much effort was dedicated to investigate whether this 'translational selection' had an influence on codon choice of highly expressed genes in various genomes. In such analyses genes are typically represented as vectors of codon frequencies, and the data analyzed using multivariate techniques, commonly either (a) dimensionality reduction, e.g. correspondence analysis, or (b) distance measures in the codon frequency space, such as the codon adaptation index (CAI). Such representations of data can be challenging as genes are too short to allow precise estimation of codon frequencies, introducing noise and consequently leading to serious artifacts in some commonly used methods. A supervised machine learning approach, as embodied in the use of a classifier, provides an alternative more robust to noise and also more sensitive in detecting codon biases. We describe a Random Forest-based computational framework that enables control over confounding factors (here, the background nucleotide substitution patterns) while reliably detecting translational selection, demonstrated on a large set of prokaryotic genomes.
Izvorni jezik
Engleski
Znanstvena područja
Biologija, Računarstvo
POVEZANOST RADA
Projekti:
098-0000000-3168 - Strojno učenje prediktivnih modela u računalnoj biologiji (Šmuc, Tomislav, MZOS ) ( CroRIS)
Ustanove:
Institut "Ruđer Bošković", Zagreb