Pregled bibliografske jedinice broj: 320292
Modeling global properties of proteins
Modeling global properties of proteins // Book of Abstracts, Regional Biophysics Conference 2007 / Zimanyi, Laszlo ; Kota, Zoltan ; Szalontai, Balazs (ur.).
Balatonfuered: -, 2007. str. 63-63 (pozvano predavanje, međunarodna recenzija, sažetak, znanstveni)
CROSBI ID: 320292 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Modeling global properties of proteins
Autori
Lučić, Bono
Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, sažetak, znanstveni
Izvornik
Book of Abstracts, Regional Biophysics Conference 2007
/ Zimanyi, Laszlo ; Kota, Zoltan ; Szalontai, Balazs - Balatonfuered, 2007, 63-63
Skup
Regional Biophysics Conference 2007
Mjesto i datum
Balatonfüred, Mađarska, 21.08.2007. - 25.08.2007
Vrsta sudjelovanja
Pozvano predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
modeling global properties of proteins; folding/unfolding rates; secondary structure content; protein structural class; calculated protein structure attributes
Sažetak
Modeling global structural properties of proteins (like folding type, secondary structure content, folding or unfolding rate constants, location of a protein in the cell, etc.) from the structure is one of the most important challenges of computational structural biology and biophysics, and the first step is analysis of properties of a new protein sequence. There have been many attempts to predict global protein features, but a lot of models that include many non-significant parameters have been developed and published. Such models are not of high accuracy, especially not of such high accuracy as it was presented in original publication. I will illustrate overfitting problems in modeling global features of proteins on examples of publications related to modeling protein folding rate constants [1, 2] and protein secondary structure contents [3, 4]. Folding and unfolding rate constants are modeled by using average of physical/chemical properties of amino acid residue of protein [1, 2]. The authors selected many parameters in models comparing with the total number of proteins in data sets. Due to this reason, correlation in developed models are really due to the chance, and although statistical parameters of fit and leave one out procedure 'are' excellent, they are consequence of random correlation. To illustrate this, we re-calculated model parameters for 10 proteins of mixed class (eq. 4 in ref. 1) using four parameters (polarity, refractive index, solvent-accessible surface area upon unfolding, and unfolding entropy change of hydration) each having three decimal places for each protein, and obtained completely the same model parameters (correlation coefficient r = 0.994). But, after that we used the same parameters in which each value was rounded to two decimal places, and model parameters were drastically changed, as well as statistical parameters (r = 0.885). In the final model for all classes (29 proteins) 16 parameters were selected, what is unambiguous indication that the model is overfitted. The same case is for all other models developed in refs 1 and 2. Improvement of models for folding rate constants by inclusion of novel parameters that are based on properties of amino acid residues and their distribution through sequence will be presented Second example is related to modeling the protein secondary structure content on four data sets having 166, 262, 398 and 475 soluble proteins [3]. Developed model in ref. 3 involved 57 independent parameters (optimized constants) for all three secondary structure types (α , β and coil) in linear and 247 optimized parameters in nonlinear models. By performing selection of small number (only five) of most important parameters (among 20 frequencies of amino acid residues and 210 frequencies of products of frequencies, and among them product of ala x leu was most important ones), I selected much simpler and better models. Mean absolute error for data set of 262 proteins for three secondary structure contents is 9% with the model having only five parameters for each of three secondary structure types, comparing with corresponding error of 11% obtained in ref. 3. These models can be improved by inclusion of autocorrelation function that are computed using relevant properties of amino acid residues for each protein sequence, what will be illustrated.
Izvorni jezik
Engleski
Znanstvena područja
Kemija
POVEZANOST RADA
Projekti:
098-1770495-2919 - Razvoj metoda za modeliranje svojstava bioaktivnih molekula i proteina (Lučić, Bono, MZOS ) ( CroRIS)
Ustanove:
Institut "Ruđer Bošković", Zagreb
Profili:
Bono Lučić
(autor)