Pregled bibliografske jedinice broj: 829534
Extensive Complementarity between Gene Function Prediction Methods
Extensive Complementarity between Gene Function Prediction Methods // Bioinformatics, 32 (2016), 23; 3645-3653 doi:10.1093/bioinformatics/btw532 (međunarodna recenzija, članak, znanstveni)
CROSBI ID: 829534 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Extensive Complementarity between Gene Function Prediction Methods
Autori
Vidulin, Vedrana ; Šmuc, Tomislav ; Supek, Fran
Izvornik
Bioinformatics (1367-4803) 32
(2016), 23;
3645-3653
Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni
Ključne riječi
gene function prediction ; comparative genomics ; Gene Ontology ; random forest
Sažetak
Motivation: The number of sequenced genomes rises steadily, but we still lack the knowledge about the biological roles of many genes. Automated function prediction (AFP) is thus a necessity. We hypothesize that AFP approaches which draw on distinct genome features may be useful for predicting different types of gene functions, motivating a systematic analysis of the benefits gained by obtaining and integrating such predictions. Results: Our pipeline amalgamates 5, 133, 543 genes from 2, 071 genomes in a single massive analysis that evaluates five established genomic AFP methodologies. While 1, 227 Gene Ontology terms yielded reliable predictions, the majority of these functions were accessible to only one or two of the methods. Moreover, different methods tend to assign a GO term to non-overlapping sets of genes. Thus, inferences made by diverse AFP methods display a striking complementary, both gene-wise and function-wise. Because of this, a viable integration strategy is to rely on a single most- confident prediction per gene/function, instead of enforcing agreement across multiple AFP methods. Using an information-theoretic approach, we estimate that current databases contain 29.2 bits/gene of known E. coli gene functions. This can be increased by up to 5.5 bits/gene using individual AFP methods, or by 11 additional bits/gene upon integration, thereby providing a highly-ranking predictor on the CAFA2 community benchmark. Availability of more sequenced genomes boosts the predictive accuracy of AFP approaches and also the benefit from integrating them.
Izvorni jezik
Engleski
Znanstvena područja
Biologija, Računarstvo, Biotehnologija
POVEZANOST RADA
Projekti:
ICT-2013-612944
EK-FP7-317532 - Temeljno istraživanje MULTIlevel comPLEX mreža i sustava (MULTIPLEX) (Zlatić, Vinko; Šmuc, Tomislav, EK ) ( CroRIS)
EK-FP7-316289 - Poboljšanje inovacijskog potencijala u jugoistočnoj Europi kroz molekularna rješenja u istraživanju i razvoju (INNOMOL) (Vugrek, Oliver, EK ) ( CroRIS)
HRZZ-IP-2013-11-5660 - Mulitidisciplinarni pristup otkriću lijekova s ciljanim djelovanjem na matične stanice tumora – uloga transporta kalija (MultiCaST) (Kralj, Marijeta, HRZZ - 2013-11) ( CroRIS)
HRZZ-IP-2013-11-9623 - Postupci strojnog učenja za dubinsku analizu složenih struktura podataka (DescriptiveInduction) (Gamberger, Dragan, HRZZ - 2013-11) ( CroRIS)
Ustanove:
Institut "Ruđer Bošković", Zagreb
Citiraj ovu publikaciju:
Časopis indeksira:
- Current Contents Connect (CCC)
- Web of Science Core Collection (WoSCC)
- Science Citation Index Expanded (SCI-EXP)
- SCI-EXP, SSCI i/ili A&HCI
- Scopus
- MEDLINE