Pregled bibliografske jedinice broj: 1121912
Classification and Information Extraction from Documents in the Domain of Culture
Classification and Information Extraction from Documents in the Domain of Culture, 2021., diplomski rad, diplomski, Odjel za informatiku, Rijeka
CROSBI ID: 1121912 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Classification and Information Extraction from
Documents in the Domain of Culture
Autori
Petar Kristijan Bogović
Vrsta, podvrsta i kategorija rada
Ocjenski radovi, diplomski rad, diplomski
Fakultet
Odjel za informatiku
Mjesto
Rijeka
Datum
30.03
Godina
2021
Stranica
56
Mentor
Martinčić-Ipšić, Sanda
Ključne riječi
Information extraction, classification, named entity recognition, topic modelling, keyphrase extraction, culture, policy, society
Sažetak
The main goal of this thesis is to develop procedures for computer analysis of documents in the field of culture, cultural policies and activities. The collected documents need to be preprocessed and prepared for further computer processing, e.g. to perform lemmatization, stemming, and other NLP procedures. In this thesis, various NLP procedures will be implemented: classification, automatic extraction of keywords and locations, and the topic modeling procedure. Automatic text classification will be implemented to classify documents into already defined categories of cultural policy impacts on broader social aspects, using a standard word bag model for document representation and machine learning algorithms such as Naive Bayes, Support Vector Machine and Random Tree Forests for the classification of documents. Automatic keyword and location extraction procedures will be implemented using the MAUI keyword extraction method and the Named Entity Recognition algorithm with available tools. The topic modeling process will be performed using the Latent Dirichlet Allocation (LDA) and evaluated using the coherence of the obtained topics.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo, Informacijske i komunikacijske znanosti
POVEZANOST RADA
Projekti:
NadSve-Sveučilište u Rijeci-uniri-drustv-18-20 - Izlučivanje ključnih riječi i sažimanje tekstova na temelju reprezentacije u mrežama jezika-LangNet (LangNet) (Martinčić-Ipšić, Sanda, NadSve - Natječaj za dodjelu sredstava potpore znanstvenim istraživanjima na Sveučilištu u Rijeci za 2018. godinu - projekti iskusnih znanstvenika i umjetnika) ( CroRIS)
EK-H2020-870935 - Measuring the Social Dimension of Culture (MESOC) (EK - H2020-SC6-TRANSFORMATIONS-2019) ( CroRIS)
Ustanove:
Fakultet informatike i digitalnih tehnologija, Rijeka