Distance Measures and Machine Learning Approaches for Codon Usage Analyses (CROSBI ID 44075)
Prilog u knjizi | izvorni znanstveni rad
Podaci o odgovornosti
Supek, Fran ; Šmuc, Tomislav
engleski
Distance Measures and Machine Learning Approaches for Codon Usage Analyses
Unequal use of synonymous codons is a widespread phenomenon caused largely by directional mutation pressures, but also by natural selection for speed and/or accuracy of protein translation. Much effort was dedicated to investigate whether this 'translational selection' had an influence on codon choice of highly expressed genes in various genomes. In such analyses genes are typically represented as vectors of codon frequencies, and the data analyzed using multivariate techniques, commonly either (a) dimensionality reduction, e.g. correspondence analysis, or (b) distance measures in the codon frequency space, such as the codon adaptation index (CAI). Such representations of data can be challenging as genes are too short to allow precise estimation of codon frequencies, introducing noise and consequently leading to serious artifacts in some commonly used methods. A supervised machine learning approach, as embodied in the use of a classifier, provides an alternative more robust to noise and also more sensitive in detecting codon biases. We describe a Random Forest-based computational framework that enables control over confounding factors (here, the background nucleotide substitution patterns) while reliably detecting translational selection, demonstrated on a large set of prokaryotic genomes.
codon bias, supervised machine learning, translational selection, highly expressed genes, Random Forest
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
229-244.
objavljeno
Podaci o knjizi
Codon Evolution - Mechanisms and Models
Cannarozzi, Gina ; Schneider, Adrian
Oxford: Oxford University Press
2011.
9780199601665