CroRIS - CROSBI

izvor podataka: crosbi !

Classification and Information Extraction from Documents in the Domain of Culture (CROSBI ID 440652)

Ocjenski rad | diplomski rad

Petar Kristijan Bogović Classification and Information Extraction from Documents in the Domain of Culture / Martinčić-Ipšić, Sanda (mentor); Rijeka, . 2021

Podaci o odgovornosti

Autori

Petar Kristijan Bogović

Mentori

Martinčić-Ipšić, Sanda

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

Classification and Information Extraction from Documents in the Domain of Culture

Sažetak

The main goal of this thesis is to develop procedures for computer analysis of documents in the field of culture, cultural policies and activities. The collected documents need to be preprocessed and prepared for further computer processing, e.g. to perform lemmatization, stemming, and other NLP procedures. In this thesis, various NLP procedures will be implemented: classification, automatic extraction of keywords and locations, and the topic modeling procedure. Automatic text classification will be implemented to classify documents into already defined categories of cultural policy impacts on broader social aspects, using a standard word bag model for document representation and machine learning algorithms such as Naive Bayes, Support Vector Machine and Random Tree Forests for the classification of documents. Automatic keyword and location extraction procedures will be implemented using the MAUI keyword extraction method and the Named Entity Recognition algorithm with available tools. The topic modeling process will be performed using the Latent Dirichlet Allocation (LDA) and evaluated using the coherence of the obtained topics.

Ključne riječi

Information extraction, classification, named entity recognition, topic modelling, keyphrase extraction, culture, policy, society

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o izdanju

Broj stranica

Datum obrane

30.03.2021.

Status objave rada

obranjeno

Podaci o ustanovi koja je dodijelila akademski stupanj

Mjesto

Rijeka

Povezanost rada

Povezane osobe

Sanda Martinčić-Ipšić (mentor/i)

Petar Kristijan Bogović (autor/i)

Povezane ustanove

Sveučilište u Rijeci, Fakultet informatike i digitalnih tehnologija (318) (autorova ustanova)

Povezani projekti

Izlučivanje ključnih riječi i sažimanje tekstova na temelju reprezentacije u mrežama jezika-LangNet (rezultat rada na projektu)

Measuring the Social Dimension of Culture (rezultat rada na projektu)

Područje

Informacijske i komunikacijske znanosti, Računarstvo