Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi !

Classification and Information Extraction from Documents in the Domain of Culture (CROSBI ID 440652)

Ocjenski rad | diplomski rad

Petar Kristijan Bogović Classification and Information Extraction from Documents in the Domain of Culture / Martinčić-Ipšić, Sanda (mentor); Rijeka, . 2021

Podaci o odgovornosti

Petar Kristijan Bogović

Martinčić-Ipšić, Sanda

engleski

Classification and Information Extraction from Documents in the Domain of Culture

The main goal of this thesis is to develop procedures for computer analysis of documents in the field of culture, cultural policies and activities. The collected documents need to be preprocessed and prepared for further computer processing, e.g. to perform lemmatization, stemming, and other NLP procedures. In this thesis, various NLP procedures will be implemented: classification, automatic extraction of keywords and locations, and the topic modeling procedure. Automatic text classification will be implemented to classify documents into already defined categories of cultural policy impacts on broader social aspects, using a standard word bag model for document representation and machine learning algorithms such as Naive Bayes, Support Vector Machine and Random Tree Forests for the classification of documents. Automatic keyword and location extraction procedures will be implemented using the MAUI keyword extraction method and the Named Entity Recognition algorithm with available tools. The topic modeling process will be performed using the Latent Dirichlet Allocation (LDA) and evaluated using the coherence of the obtained topics.

Information extraction, classification, named entity recognition, topic modelling, keyphrase extraction, culture, policy, society

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o izdanju

56

30.03.2021.

obranjeno

Podaci o ustanovi koja je dodijelila akademski stupanj

Rijeka

Povezanost rada

Informacijske i komunikacijske znanosti, Računarstvo