Pretražite po imenu i prezimenu autora, mentora, urednika, prevoditelja

Napredna pretraga

Pregled bibliografske jedinice broj: 884152

Classification of Large-Scale Biological Annotations Using Word Embeddings Derived from Corpora of Biomedical Research Literature


Baćac, Adriano
Classification of Large-Scale Biological Annotations Using Word Embeddings Derived from Corpora of Biomedical Research Literature, 2017., diplomski rad, diplomski, Fakultet Elektrotehnike i Računarstva, Zagreb


CROSBI ID: 884152 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Classification of Large-Scale Biological Annotations Using Word Embeddings Derived from Corpora of Biomedical Research Literature

Autori
Baćac, Adriano

Vrsta, podvrsta i kategorija rada
Ocjenski radovi, diplomski rad, diplomski

Fakultet
Fakultet Elektrotehnike i Računarstva

Mjesto
Zagreb

Datum
10.07

Godina
2017

Stranica
45

Mentor
Šikić, Mile

Ključne riječi
word embedding, Word2vec, GloVe, RNN, LSTM, phenotype classification, corpus specificity

Sažetak
Custom Word2vec and GloVe embeddings for scientific literature in the biomedical domain were trained, as well as three classification methods for discriminating phenotype traits, two of which were based on aggregating word embeddings and one on recurrent neural networks. Word embeddings were trained on a large corpus of scientific articles and its more subject-specific subsets. Classification performance was tested on 6 document sources. It was shown that Word2vec achieves better performance when trained on a subject-specific subset corpus comprised of 4.9% articles, than when trained on the entire corpus. Using recurrent neural networks had an overfitting problem, possibly because the documents were too long or the training set too small. Although the proposed models did not outperform support vector machine using bag-of-words, it was shown that using the aggregation methods alongside the baseline model increases the amount of correctly classified minority class in some phenotype traits by around 10%.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo



POVEZANOST RADA


Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb

Profili:

Avatar Url Mile Šikić (mentor)

Poveznice na cjeloviti tekst rada:

Pristup cjelovitom tekstu rada

Citiraj ovu publikaciju:

Baćac, Adriano
Classification of Large-Scale Biological Annotations Using Word Embeddings Derived from Corpora of Biomedical Research Literature, 2017., diplomski rad, diplomski, Fakultet Elektrotehnike i Računarstva, Zagreb
Baćac, A. (2017) 'Classification of Large-Scale Biological Annotations Using Word Embeddings Derived from Corpora of Biomedical Research Literature', diplomski rad, diplomski, Fakultet Elektrotehnike i Računarstva, Zagreb.
@phdthesis{phdthesis, author = {Ba\'{c}ac, Adriano}, year = {2017}, pages = {45}, keywords = {word embedding, Word2vec, GloVe, RNN, LSTM, phenotype classification, corpus specificity}, title = {Classification of Large-Scale Biological Annotations Using Word Embeddings Derived from Corpora of Biomedical Research Literature}, keyword = {word embedding, Word2vec, GloVe, RNN, LSTM, phenotype classification, corpus specificity}, publisherplace = {Zagreb} }
@phdthesis{phdthesis, author = {Ba\'{c}ac, Adriano}, year = {2017}, pages = {45}, keywords = {word embedding, Word2vec, GloVe, RNN, LSTM, phenotype classification, corpus specificity}, title = {Classification of Large-Scale Biological Annotations Using Word Embeddings Derived from Corpora of Biomedical Research Literature}, keyword = {word embedding, Word2vec, GloVe, RNN, LSTM, phenotype classification, corpus specificity}, publisherplace = {Zagreb} }




Contrast
Increase Font
Decrease Font
Dyslexic Font