Clustering of protein domains for functional and evolutionary studies

Goldstein, Pavle; Žučko, Jurica; Vujaklija, Dušica; Kriško, Anita; Hranueli, Daslav; Long, Paul F.; Etchebest, Catherine; Basrak, Bojan; Cullum, John

izvor podataka: crosbi ✓

Clustering of protein domains for functional and evolutionary studies (CROSBI ID 156548)

Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija

Goldstein, Pavle ; Žučko, Jurica ; Vujaklija, Dušica ; Kriško, Anita ; Hranueli, Daslav ; Long, Paul F. ; Etchebest, Catherine ; Basrak, Bojan ; Cullum, John Clustering of protein domains for functional and evolutionary studies // BMC bioinformatics, 10 (2009), 335, 11. doi: 10.1186/1471-2105-10-335

Podaci o odgovornosti

Autori

Goldstein, Pavle ; Žučko, Jurica ; Vujaklija, Dušica ; Kriško, Anita ; Hranueli, Daslav ; Long, Paul F. ; Etchebest, Catherine ; Basrak, Bojan ; Cullum, John

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

Clustering of protein domains for functional and evolutionary studies

Sažetak

Background The number of protein family members defined by DNA sequencing is usually much larger than those characterised experimentally. This paper describes a method to divide protein families into subtypes purely on sequence criteria. Comparison with experimental data allows an independent test of the quality of the clustering. Results An evolutionary split statistic is calculated for each column in a protein multiple sequence alignment ; the statistic has a larger value when a column is better described by an evolutionary model that assumes clustering around two or more amino acids rather than a single amino acid. The user selects columns (typically the top ranked columns) to construct a motif. The motif is used to divide the family into subtypes using a stochastic optimization procedure related to the deterministic annealing EM algorithm (DAEM), which yields a specificity score showing how well each family member is assigned to a subtype. The clustering obtained is not strongly dependent on the number of amino acids chosen for the motif. The robustness of this method was demonstrated using six well characterized protein families: nucleotidyl cyclase, protein kinase, dehydrogenase, two polyketide synthase domains and small heat shock proteins. Phylogenetic trees did not allow accurate clustering for three of the six families. Conclusion The method clustered the families into functional subtypes with an accuracy of 90 to 100%. False assignments usually had a low specificity score.

Ključne riječi

protein families ; DNA sequences ; sequence criteria ; evolutionary split statistic ; clustering algorithm

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o izdanju

Časopis

BMC bioinformatics

Volumen (broj)

Godina

2009.

Broj rada

335

Broj stranica

Status objave rada

objavljeno

e-ISSN

1471-2105

DOI

10.1186/1471-2105-10-335

Povezanost rada

Povezane osobe

Pavle Goldstein (autor/i)

Jurica Žučko (autor/i)

Dušica Vujaklija (autor/i)

Anita Kriško (autor/i)

Daslav Hranueli (autor/i)

Bojan Basrak (autor/i)

Povezane ustanove

Institut Ruđer Bošković (098) (autorova ustanova)

Prehrambeno-biotehnološki fakultet (058) (autorova ustanova)

Prirodoslovno-matematički fakultet, Matematički odjel, Zagreb (037) (autorova ustanova)

Prirodoslovno-matematički fakultet, Zagreb (119) (autorova ustanova)

Povezani projekti

Deterministički i probabilistički modeli u biologiji (rezultat rada na projektu)

Generiranje potencijalnih lijekova u uvjetima in silico (rezultat rada na projektu)

Temeljna molekularno-biološka istraživanja streptomiceta (rezultat rada na projektu)

Područje

Biologija, Biotehnologija, Matematika

Poveznice

doi.org

bmcbioinformatics.biomedcentral.com

doi.org

bmcbioinformatics.biomedcentral.com

Indeksiranost

Scopus

Medline

Web of Science Core Collection, Science Citation Index Expanded (WoSCC-SCI-Exp)

Web of Science Core Collection, SCI-Exp, SSCI & A&HCI (WoSCC-SCI-Exp, SSCI, A&HCI)