Pregled bibliografske jedinice broj: 357927
Distributions of Automatically Segmented Phonemes in Croatian Speech
Distributions of Automatically Segmented Phonemes in Croatian Speech // 6. Znanstveni skup s međunarodnim sudjelovanjem "Istraživanja govora" / G. Varošanec-Škarić ; D. Horga (ur.).
Zagreb, 2007. str. 83-85 (predavanje, međunarodna recenzija, sažetak, znanstveni)
CROSBI ID: 357927 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Distributions of Automatically Segmented Phonemes in Croatian Speech
Autori
Martinčić–Ipšić, Sanda ; Grzybek, Peter ; Mačutek, Jan ; Matešić, Mihaela
Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, sažetak, znanstveni
Izvornik
6. Znanstveni skup s međunarodnim sudjelovanjem "Istraživanja govora"
/ G. Varošanec-Škarić ; D. Horga - Zagreb, 2007, 83-85
ISBN
978-953-7067-82-3
Skup
6. Znanstveni skup s međunarodnim sudjelovanjem "Istraživanja govora"
Mjesto i datum
Zagreb, Hrvatska, 06.12.2007. - 08.12.2007
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
automatic speech segmenation; distributions
Sažetak
In this paper we describe an automatic segmentation procedure for Croatian speech, which is based on a monophone speech recognition system and on word level transcriptions of speech signals. Since the transcription of the speech files is on the word level, the utterances have to be segmented on the phone level for the training procedures. For the word segmentation and recognition task, we have developed a phonetic dictionary, where a set of phonetic symbols is used to transcribe the words from the Croatian speech database ; the selected phonemes are derived according to SAMPA symbols proposed for Croatian [2]. The phonetic dictionary comprises all words, including all flective word formats, which occur in the Croatian speech corpora and their phonetic transcriptions [8]. The Croatian orthographic-to-phonetic rules are used for automatic conversion of graphemes into phonemes. The initial phone level segmentation of speech is performed using automatic alignment of speech signals and word transcriptions, which is based on hidden Markov monophone models (HMM) [3]. The automatic segmentation is performed using the forced alignment of the spoken utterance and the corresponding transcription using the monophone speech recognizer. The forced alignment assumes that all phones in the utterance are initially equally segmented. The monophone models were trained by iterations of Baum-Welch algorithm [4]. The Viterbi algorithm was used to find the most likely sequence of HMM states [4]. The results of the Viterbi algorithm are automatically determined time intervals of spoken phones in the speech signals. The automatically segmented phones are used as input for the speech recognition and speech synthesis training procedures. Since we use HMMs for acoustical modeling of Croatian speech in the speech recognition as well as in the speech synthesis, the same automatic segmentation procedure was performed and the same automatically segmented phones are used for training of the acoustic models of both systems [7]. Automatic segmentation results are presented for 13 hours of 25 professional speakers’ speech. Indirect measures used for the automatic speech segmentation performance are phoneme recognition correctness and word recognition correctness and accuracy. Additionally, the Croatian phoneme duration was calculated from automatically segmented phones. The data of the calculated duration for 674746 phones were used to test a theoretical model for phoneme duration, based on Altmann’ s [9] findings for vowel duration. A first attempt is made to apply this model to standard Croatian data and extend it to consonants.
Izvorni jezik
Engleski
Znanstvena područja
Informacijske i komunikacijske znanosti
POVEZANOST RADA
Projekti:
009-0361935-0852 - Govorne tehnologije
Ustanove:
Filozofski fakultet, Rijeka