Normalization of Non-Standard Words in Croatian Texts (CROSBI ID 592844)
Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Beliga, Slobodan ; Pobar, Miran ; Martinčić-Ipšić, Sanda
engleski
Normalization of Non-Standard Words in Croatian Texts
This paper presents text normalization which is an integral part of any text-to-speech synthesis system. Text normalization is a set of methods with a task to write non-standard words, like numbers, dates, times, abbreviations, acronyms and the most common symbols, in their full expanded form. The whole taxonomy for classification of non-standard words in Croatian language together with rule-based normalization methods combined with a lookup dictionary are proposed. Achieved token rate for normalization of Croatian texts is 95%, where 80% of expanded words are in correct morphological form.
text normalization; non-standard words; text-to-speech
Student Section
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
1-8.
2011.
objavljeno
Podaci o matičnoj publikaciji
Text, Speech and Dialogue extension to Lecture Notes in Artificial Intelligence LNAI6836
Hebernal, Ivan ; Matoušek, Vaclav
Plzeň: University of West Bohemia
987-80-261-0069-0
Podaci o skupu
Text, Speech and Dialogue
predavanje
01.09.2011-05.09.2011
Plzeň, Češka Republika