Pregled bibliografske jedinice broj: 451668
Perceptual Significance of Cepstral Distortion Measures in Digital Speech Processing
Perceptual Significance of Cepstral Distortion Measures in Digital Speech Processing // Automatika : časopis za automatiku, mjerenje, elektroniku, računarstvo i komunikacije, 52 (2011), 2; 132-146 (međunarodna recenzija, članak, znanstveni)
CROSBI ID: 451668 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Perceptual Significance of Cepstral Distortion Measures in Digital Speech Processing
Autori
Vasilijević, Antonio ; Petrinović, Davor
Izvornik
Automatika : časopis za automatiku, mjerenje, elektroniku, računarstvo i komunikacije (0005-1144) 52
(2011), 2;
132-146
Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni
Ključne riječi
Aliasing ; Digital speech processing ; MFCC ; Mel cepstrum ; SD Measure ; Speech recognition ; OBJECTIVE ASSESSMENT ; RECOGNITION ; QUALITY
Sažetak
Currently, one of the most widely used distance measures in speech and speaker recognition is the Euclidean distance between mel frequency cepstral coefficients (MFCC). MFCCs are based on filter bank algorithm whose filters are equally spaced on a perceptually motivated mel frequency scale. The value of mel cepstral vector, as well as the properties of the corresponding cepstral distance, are determined by several parameters used in mel cepstral analysis. The aim of this work is to examine compatibility of MFCC measure with human perception for different values of parameters in the analysis. By analysing mel filter bank parameters it is found that filter bank with 24 bands, 220 mels bandwidth and band overlap coefficient equal and higher than one gives optimal spectral distortion (SD) distance measures. For this kind of mel filter bank, the difference between vowels can be recognised for full-length mel cepstral SD RMS measure higher than 0.4 - 0.5 dB. Further on, we will show that usage of truncated mel cepstral vector (12 coefficients) is justified for speech recognition, but may be arguable for speaker recognition. We also analysed the impact of aliasing in cepstral domain on cepstral distortion measures. The results showed high correlation of SD distances calculated from aperiodic and periodic mel cepstrum, leading to the conclusion that the impact of aliasing is generally minor. There are rare exceptions where aliasing is present, and these were also analysed.
Izvorni jezik
Engleski
Znanstvena područja
Elektrotehnika, Računarstvo
Citiraj ovu publikaciju:
Časopis indeksira:
- Web of Science Core Collection (WoSCC)
- Science Citation Index Expanded (SCI-EXP)
- SCI-EXP, SSCI i/ili A&HCI
- Scopus
Uključenost u ostale bibliografske baze podataka::
- INSPEC
- Science Citation Index Expanded
- Chemical Abstracts
- Current Bibliography on Science and Technology
- Science Abstracts
- Referativnii Zurnal