Pregled bibliografske jedinice broj: 1197683
Deep Convolutional Oscillator: Synthesizing Waveforms from Timbral Descriptors
Deep Convolutional Oscillator: Synthesizing Waveforms from Timbral Descriptors // Proceedings of the 19th Sound and Music Computing Conference
Saint-Étienne, Francuska, 2022. str. 200-206 doi:10.5281/zenodo.6573045 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 1197683 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Deep Convolutional Oscillator: Synthesizing
Waveforms from Timbral Descriptors
Autori
Kreković, Gordan
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Proceedings of the 19th Sound and Music Computing Conference
/ - , 2022, 200-206
Skup
Sound and Music Computing 2022 (SMC-22)
Mjesto i datum
Saint-Étienne, Francuska, 05.06.2022. - 12.06.2022
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
sound synthesis ; wavetable synthesis ; deep learning ; convolutional neural networks ; timbral attributes ; generative neural network
Sažetak
This paper presents a novel deep learning model for synthesizing single-cycle waveforms from timbral attributes. The motivation was to investigate a viable alternative to traditional wavetable oscillators with intuitive control. Based on a thorough literature review and practical considerations , we selected three attributes appropriate for describing timbral characteristics of steady and harmonic tones: bright, warm, and rich. A deep learning network was designed to map magnitudes of these attributes to single cycle waveforms. The architecture was based on stacking of upsampling and convolutional layers to model temporal dependencies within the waveform. The network was trained on a large number of waveforms extracted from NSynth dataset. Audio features closely related to the selected attributes were used as inputs, while the custom loss function was employed to minimize the difference in normalized power spectra between outputs and training wave-forms. Four models with different hyperparameters were trained and the best one was selected using the validation dataset. Further experiments with the selected model showed that synthesized waveforms generally match the input attributes well, as the mean absolute errors for normalized attributes were 0.07, 0.05, and 0.18 for bright, warm, and rich respectively on the testing dataset.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo