Pregled bibliografske jedinice broj: 1201776
Multimodal Emotion Analysis Based on Visual, Acoustic and Linguistic Features
Multimodal Emotion Analysis Based on Visual, Acoustic and Linguistic Features // Social Computing and Social Media: Design, User Experience and Impact. HCII 2022. Lecture Notes in Computer Science, vol 13315. / Meiselwitz, G (ur.).
online: Springer, 2022. str. 318-331 doi:10.1007/978-3-031-05061-9_23 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)
CROSBI ID: 1201776 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Multimodal Emotion Analysis Based on Visual, Acoustic and Linguistic
Features
Autori
Koren, Leon ; Stipancic, Tomislav ; Ricko, Andrija ; Orsag, Luka
Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni
Izvornik
Social Computing and Social Media: Design, User Experience and Impact. HCII 2022. Lecture Notes in Computer Science, vol 13315.
/ Meiselwitz, G - : Springer, 2022, 318-331
ISBN
978-3-031-05060-2
Skup
International Conference on Human-Computer Interaction (HCII 2022)
Mjesto i datum
Online, 26.06.2022. - 01.07.2022
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Međunarodna recenzija
Ključne riječi
Nonverbal behavior ; Multimodal interaction ; Artificial intelligence ; Cognitive Robotics ; Social Signal Processing
Sažetak
In this paper, a computational reasoning framework that can interpret social signals of the person in interaction by focusing on the person’s emotional state is presented. Two distinct sources of social signals are used for this study: facial and voice emotion modalities. As a part of the first modality, a Convolutional Neural Network (CNN) is used to extract and process the facial features based on live stream video. The voice emotion analysis containing two sub-modalities is driven by CNN and Long Short-Term Memory (LSTM) networks. The networks are analyzing the acoustic and linguistic features of the voice to determine the possible emotional cues of the person in interaction. Relying on the multimodal information fusion, the system then fuses data into a single hypothesis. Results of such reasoning are used to autonomously generate the robot responses which are shown in a form of non-verbal facial animations projected on the ‘face’ surface of the affective robot head PLEA. Built-in functionalities of the robot can provide a degree of situational embodiment, self-explainability and context-driven interaction.
Izvorni jezik
Engleski
Znanstvena područja
Elektrotehnika, Računarstvo, Strojarstvo, Informacijske i komunikacijske znanosti
POVEZANOST RADA
Projekti:
HRZZ-UIP-2020-02-7184 - Afektivna multimodalna interakcija temeljena na konstruiranoj robotskoj spoznaji (AMICORC) (Stipančić, Tomislav, HRZZ - 2020-02) ( CroRIS)
Ustanove:
Fakultet strojarstva i brodogradnje, Zagreb
Citiraj ovu publikaciju:
Časopis indeksira:
- Scopus