Pregled bibliografske jedinice broj: 1033335
Automated Phonetic Transcription of Croatian Folklore Genres Using Supervised Machine Learning
Automated Phonetic Transcription of Croatian Folklore Genres Using Supervised Machine Learning // INFuture2019: Knowledge in the Digital Age / Bago, Petra ; Hebrang Grgić, Ivana ; Ivanjko, Tomislav ; Juričić, Vedran ; Miklošević, Željka ; Stublić, Helena (ur.).
Zagreb: Filozofski fakultet Sveučilišta u Zagrebu, 2019. str. 129-133 doi:10.17234/INFUTURE.2019.16 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 1033335 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Automated Phonetic Transcription of Croatian Folklore Genres Using Supervised Machine Learning
Autori
Bakarić, Nikola ; Nikolić, Davor
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
INFuture2019: Knowledge in the Digital Age
/ Bago, Petra ; Hebrang Grgić, Ivana ; Ivanjko, Tomislav ; Juričić, Vedran ; Miklošević, Željka ; Stublić, Helena - Zagreb : Filozofski fakultet Sveučilišta u Zagrebu, 2019, 129-133
Skup
7th International Conference The Future of Information Sciences (INFuture 2019)
Mjesto i datum
Zagreb, Hrvatska, 21.11.2019. - 22.11.2019
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
text transcription ; automation ; natural language processing ; supervised learning ; Croatian folklore genres
Sažetak
This paper aims to detect the possibilities of automatic text transcription for the purpose of preparing a corpus for further natural language processing analysis. The corpus contains various Croatian folklore genres. The transcription goal is to have one character represent one phoneme and remove spaces between accentuated and non-accentuated words. This knowledge independent system is trained using supervised learning methods and applied to the rest of the corpus using classifiers such as the naïve Bayes, k-nearest neighbour, support vector machine and others. The results are compared to a human-annotated sample to determine accuracy.
Izvorni jezik
Engleski
Znanstvena područja
Informacijske i komunikacijske znanosti, Filologija
POVEZANOST RADA
Ustanove:
Filozofski fakultet, Zagreb,
Veleučilište Velika Gorica