Pregled bibliografske jedinice broj: 884152
Classification of Large-Scale Biological Annotations Using Word Embeddings Derived from Corpora of Biomedical Research Literature
Classification of Large-Scale Biological Annotations Using Word Embeddings Derived from Corpora of Biomedical Research Literature, 2017., diplomski rad, diplomski, Fakultet Elektrotehnike i Računarstva, Zagreb
CROSBI ID: 884152 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Classification of Large-Scale Biological Annotations Using Word Embeddings Derived from Corpora of Biomedical Research Literature
Autori
Baćac, Adriano
Vrsta, podvrsta i kategorija rada
Ocjenski radovi, diplomski rad, diplomski
Fakultet
Fakultet Elektrotehnike i Računarstva
Mjesto
Zagreb
Datum
10.07
Godina
2017
Stranica
45
Mentor
Šikić, Mile
Ključne riječi
word embedding, Word2vec, GloVe, RNN, LSTM, phenotype classification, corpus specificity
Sažetak
Custom Word2vec and GloVe embeddings for scientific literature in the biomedical domain were trained, as well as three classification methods for discriminating phenotype traits, two of which were based on aggregating word embeddings and one on recurrent neural networks. Word embeddings were trained on a large corpus of scientific articles and its more subject-specific subsets. Classification performance was tested on 6 document sources. It was shown that Word2vec achieves better performance when trained on a subject-specific subset corpus comprised of 4.9% articles, than when trained on the entire corpus. Using recurrent neural networks had an overfitting problem, possibly because the documents were too long or the training set too small. Although the proposed models did not outperform support vector machine using bag-of-words, it was shown that using the aggregation methods alongside the baseline model increases the amount of correctly classified minority class in some phenotype traits by around 10%.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo
POVEZANOST RADA
Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb
Profili:
Mile Šikić
(mentor)