Pregled bibliografske jedinice broj: 653961
A Comparison of Two Approaches to Bilingual HMM- Based Speech Synthesis
A Comparison of Two Approaches to Bilingual HMM- Based Speech Synthesis // Text, Speech, and Dialogue / Habernal, Ivan ; Matoušek, Václav (ur.).
Plzeň: Springer, 2013. str. 44-51 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 653961 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
A Comparison of Two Approaches to Bilingual HMM- Based Speech Synthesis
Autori
Pobar, Miran ; Justin, Tadej ; Žibert, Janez ; Mihelič, France ; Ipšić, Ivo
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Text, Speech, and Dialogue
/ Habernal, Ivan ; Matoušek, Václav - Plzeň : Springer, 2013, 44-51
ISBN
978-3-642-40584-6
Skup
16th International Conference, TSD 2013, Pilsen, Czech Republic, September 1-5, 2013. Proceedings
Mjesto i datum
Brno, Češka Republika, 01.09.2013. - 05.09.2013
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
bilingual; HMM; speech synthesis; phoneme mapping; state mapping; speaker adaptation; Kullback-Leibler divergence
Sažetak
We compare the performance of two approaches when using cross-lingual data from different speakers to build bilingual speech synthesis systems capable of producing speech with the same speaker identity. One approach treats data from both languages as monolingual, by labeling all data with a manually joined phoneme set. Speaker independent voice is trained using the joined data, and adapted to the target speaker using the CMLLR adaptation. In the second approach, speaker independent voices are trained for each language separately. State mapping between these voices is derived automatically from minimum Kullback–Leibler divergence between state distributions. The mapping is used to apply the adaptation transformations calculated within one language across languages to the other speaker independent voice. We evaluate the quality of speech on MOS scale and similarity of synthesized speech characteristics to the target speaker using DMOS on the example of Croatian-Slovene language pair.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo, Informacijske i komunikacijske znanosti
POVEZANOST RADA
Projekti:
318-0361935-0852 - Govorne tehnologije (Ipšić, Ivo, MZOS ) ( CroRIS)
Ustanove:
Fakultet informatike i digitalnih tehnologija, Rijeka
Citiraj ovu publikaciju:
Časopis indeksira:
- Scopus