Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 520232

Functional classification of Adenylation domains by Latent Semantic Indexing (LSI)


Baranašić, Damir
Functional classification of Adenylation domains by Latent Semantic Indexing (LSI), 2011., diplomski rad, diplomski, Prehrambeno-biotehnološki fakultet, Zagreb


CROSBI ID: 520232 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Functional classification of Adenylation domains by Latent Semantic Indexing (LSI)

Autori
Baranašić, Damir

Vrsta, podvrsta i kategorija rada
Ocjenski radovi, diplomski rad, diplomski

Fakultet
Prehrambeno-biotehnološki fakultet

Mjesto
Zagreb

Datum
01.07

Godina
2011

Stranica
37

Mentor
Starčević, Antonio

Neposredni voditelj
Žučko, Jurica

Ključne riječi
LSI; A-domains; protein tokenization; protein clustering; SVD; dimension reduction; specificity prediction

Sažetak
Latent semantic indexing (LSI) is an information retrieval method which has relatively recently been introduced into computational biology. In this work, LSI was adapted for prediction of the amino acid substrates which are activated by adenylation domains (A-domains). A-domains are obligatory subunits of non-ribosomally synthesised peptide synthetases (NRPS) modules which recognise and activate the amino acid that must be incorporated into the final product, non-ribosomally sythesised peptides. Knowing the specific A-domain substrate for every sequenced A-domain would enable us to predict the final product of linear NRPS and perhaps design novel biologically active natural products. Two methods were used to vectorize A-domain protein sequences and to construct the resulting term-document matrix: “n-grams” method and a novel “tokenization” method. The “n-grams” method finds n-peptides in the protein sequence, and the “tokenization” method creates specific ”tokens”, which couple amino acid residues with the corresponding positions in the multiple sequence alignment. LSI uses a mathematical method called singular value decomposition (SVD) to reduce the unreliable information from the term-document matrix. The number of dimensions used in analysis was obtained computationally and was found to be in accordance with the empirically obtained optimal number of dimensions. Predictions obtained were satisfactory using both “n-grams” and “tokenization” as vectorization methods. “Tokenization” method generally showed better precision and robustness. A novel clustering method based on LSI was also developed. It showed satisfactory clustering results without the need to guess the numbers of clusters in advance which methods such as k-means clustering require.

Izvorni jezik
Engleski

Znanstvena područja
Biotehnologija



POVEZANOST RADA


Projekti:
0982560
058-0000000-3475 - Generiranje potencijalnih lijekova u uvjetima in silico (Hranueli/Jurica Žučko, Daslav, MZOS ) ( CroRIS)

Ustanove:
Prehrambeno-biotehnološki fakultet, Zagreb

Profili:

Avatar Url Jurica Žučko (mentor)

Avatar Url Antonio Starčević (mentor)

Poveznice na cjeloviti tekst rada:

Pristup cjelovitom tekstu rada bioinformatics.pbf.hr

Citiraj ovu publikaciju:

Baranašić, Damir
Functional classification of Adenylation domains by Latent Semantic Indexing (LSI), 2011., diplomski rad, diplomski, Prehrambeno-biotehnološki fakultet, Zagreb
Baranašić, D. (2011) 'Functional classification of Adenylation domains by Latent Semantic Indexing (LSI)', diplomski rad, diplomski, Prehrambeno-biotehnološki fakultet, Zagreb.
@phdthesis{phdthesis, author = {Barana\v{s}i\'{c}, Damir}, year = {2011}, pages = {37}, keywords = {LSI, A-domains, protein tokenization, protein clustering, SVD, dimension reduction, specificity prediction}, title = {Functional classification of Adenylation domains by Latent Semantic Indexing (LSI)}, keyword = {LSI, A-domains, protein tokenization, protein clustering, SVD, dimension reduction, specificity prediction}, publisherplace = {Zagreb} }
@phdthesis{phdthesis, author = {Barana\v{s}i\'{c}, Damir}, year = {2011}, pages = {37}, keywords = {LSI, A-domains, protein tokenization, protein clustering, SVD, dimension reduction, specificity prediction}, title = {Functional classification of Adenylation domains by Latent Semantic Indexing (LSI)}, keyword = {LSI, A-domains, protein tokenization, protein clustering, SVD, dimension reduction, specificity prediction}, publisherplace = {Zagreb} }




Contrast
Increase Font
Decrease Font
Dyslexic Font