Rapid Prototyping of a Croatian Large Vocabulary Continuous Speech Recognition System (CROSBI ID 604564)
Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Bajo, Dario ; Turković, Danijel ; Dembitz, Šandor
engleski
Rapid Prototyping of a Croatian Large Vocabulary Continuous Speech Recognition System
The Croatian language, like many minority languages used by less than 0.1% of the world population, is in need of mature automatic speech recognition (ASR) systems for applications such as transcription of speech recordings, voice control, an aid to impaired people, etc. This paper describes a short-term research and development project aimed to produce an applicable Croatian large vocabulary continuous speech recognition system from scratch. The open-source CMU Sphinx toolkit was our platform choice. For the purpose of acoustic model training, we made a speech training set of several hundred utterances, containing words carefully chosen according to their phonetic properties. Language models were derived from the Croatian large-scale n-gram system, which ensures the system’s applicability. During the project, we succeeded in developing an ASR system able to recognize freely chosen utterances composed of 15, 000 most frequently used Croatian words reasonably well.
automatic speech recognition; continuous speech; large-scale n-gram model; large vocabulary.
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
13-18.
2013.
objavljeno
Podaci o matičnoj publikaciji
INFOCOMP 2013
Rückemann, Claus-Peter ; Pankowska, Malgorzata
Lisabon: International Academy, Research, and Industry Association (IARIA)
978-1-61208-310-0
Podaci o skupu
The Third International Conference on Advanced Communications and Computation
predavanje
17.11.2013-22.11.2013
Lisabon, Portugal