Pregled bibliografske jedinice broj: 202974
Navigating the Similarity Space: domain architecture prediction using support vector machines
Navigating the Similarity Space: domain architecture prediction using support vector machines // Workshop on Practical Approaches to Computational Biology
Opatija, Hrvatska, 2005. (pozvano predavanje, nije recenziran, pp prezentacija, znanstveni)
CROSBI ID: 202974 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Navigating the Similarity Space: domain architecture prediction using support vector machines
Autori
Vlahoviček, Kristian
Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, pp prezentacija, znanstveni
Skup
Workshop on Practical Approaches to Computational Biology
Mjesto i datum
Opatija, Hrvatska, 01.09.2005. - 04.09.2005
Vrsta sudjelovanja
Pozvano predavanje
Vrsta recenzije
Nije recenziran
Ključne riječi
-
Sažetak
Increasing amount of primary biological information originating from genome sequencing projects calls for new approaches to large-scale classification and annotation methods. We present a method based on sequence similarity that can be applied to functional characterization of whole proteins as well as prediction of domain architecture. The method consists of building an exemplar-based database and preprocessing it, by running a database vs. database comparison, and calculating parameter values for biologically significant similarities. A support vector machine (SVM) classifier is then built from calculated values for each similarity group. Comparing an unknown query sequence against a database of ‘ similarities’ and validating the comparison using SVM, results in a biologically relevant annotation. The method performance evaluation shows overall prediction success rate of 90% on a set of 140 thousand protein domains divided in 4000 domain groups, each containing 3-7000 members, with median specificity and sensitivity per group of 98% and 93%, respectively. The ease of implementation and the speed of prediction make it an interesting candidate for large-scale annotation projects, as it involves minimal manual intervention in both training and prediction. Further applications in prediction of function will be discussed. The database of annotated protein domains and the domain architecture prediction system are available via the www interface at http://www.icgeb.trieste.it/sbase.
Izvorni jezik
Engleski
Znanstvena područja
Biologija, Računarstvo
POVEZANOST RADA
Projekti:
0119161
Ustanove:
Prirodoslovno-matematički fakultet, Zagreb
Profili:
Kristian Vlahoviček
(autor)