Pregled bibliografske jedinice broj: 794777
Novel Bioinformatics Tool for the Prediction and Analysis of G-Quadruplexes
Novel Bioinformatics Tool for the Prediction and Analysis of G-Quadruplexes, 2015., diplomski rad, diplomski, Faculty of Engineering and Information Technologies, Sarajevo, Bosna i Hercegovina
CROSBI ID: 794777 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Novel Bioinformatics Tool for the Prediction and Analysis of G-Quadruplexes
Autori
Muhović, Imer
Vrsta, podvrsta i kategorija rada
Ocjenski radovi, diplomski rad, diplomski
Fakultet
Faculty of Engineering and Information Technologies
Mjesto
Sarajevo, Bosna i Hercegovina
Datum
12.06
Godina
2015
Stranica
65
Mentor
Marjanović, Damir
Neposredni voditelj
Doluca, Osman
Ključne riječi
G-Quadruplex; bioinformatics; artificial neural network; data mining; machine learning; pharmacogenetics
Sažetak
G-quadruplexes are novel sequences of interest that have recently been implicated as having a regulatory role in the chromosome. Tools exist to predict their possible location, but are sparse in features. Using an artificial neural network we have created a method to predict the melting temperature of such sequences. The creation of this tool went through several phases, we wanted to create an easy to use, and intuitive tool that would be able to select all nucleotides of interest that would be capable of contributing to a G-quadruplex structure, and analyzing their melting temperature, in order to find the ones most likely to form under physiological conditions. We used the python programming language to construct the core algorithm that uses regular expressions to find all stretches of guanine molecules in the given sequence, it then assigns identity values to those g-boxes, and creates a tree structure out of them. The tree is then traversed to obtain all possible combinations of the g-boxes, and thus all possible G-quadruplexes. The duplicates are then pruned, and the G-quadruplexes are run through an artificial neural network which predicts the possible melting temperature of the G-quadruplex. We used the PyBrain library for the Python programming language to construct an artificial neural network, using a previously published dataset of 260 quadruplexes that included data about their sequences, physiological conditions under which they formed G-quadruplexes, and the melting temperatures of the sequences. After analyzing literature we decided on using the melting temperature as a predictor variable for the stability of the quadruplexes. The artificial neural network was trained on a subset of 108 sequences that formed G- quadruplexes in solution with K+ ions, which are thought to be biologically relevant as K+ is present at high concentrations within human cells. We combined the algorithm for finding possible G-quadruplexes with the neural network into a web-tool coded using the Django web development framework. Our webtool allows even novice users to input their sequence data, choose the parameters to their liking and obtain results as to the most likely G-quadruplex to form within the sequence. The obtained results are then presented in a tabular format, and are available for download in multiple spreadsheet formats. Our research into the use of neural networks has left us with the desire for larger, more complete datasets, as 260 sequences are not very much, and leave a lot of variables untested, such as the size of the g-blocks, the nucleotide composition of the loops, and give no mention as to the conformation that the G- quadruplex undertakes. Despite these issues that we have encountered we hope to have created a tool that will be usable by the wider scientific community for years to come.
Izvorni jezik
Engleski
Znanstvena područja
Biologija