Multimodal Emotion Analysis Based on Visual, Acoustic and Linguistic Features

Koren, Leon; Stipancic, Tomislav; Ricko, Andrija; Orsag, Luka

Pregled bibliografske jedinice broj: 1201776

Multimodal Emotion Analysis Based on Visual, Acoustic and Linguistic Features

Koren, Leon; Stipancic, Tomislav; Ricko, Andrija; Orsag, Luka

Multimodal Emotion Analysis Based on Visual, Acoustic and Linguistic Features // Social Computing and Social Media: Design, User Experience and Impact. HCII 2022. Lecture Notes in Computer Science, vol 13315. / Meiselwitz, G (ur.).
online: Springer, 2022. str. 318-331 doi:10.1007/978-3-031-05061-9_23 (predavanje, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)

CROSBI ID: 1201776 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Multimodal Emotion Analysis Based on Visual, Acoustic and Linguistic Features

Autori
Koren, Leon ; Stipancic, Tomislav ; Ricko, Andrija ; Orsag, Luka

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
Social Computing and Social Media: Design, User Experience and Impact. HCII 2022. Lecture Notes in Computer Science, vol 13315. / Meiselwitz, G - : Springer, 2022, 318-331

ISBN
978-3-031-05060-2

Skup
International Conference on Human-Computer Interaction (HCII 2022)

Mjesto i datum
Online, 26.06.2022. - 01.07.2022

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
Nonverbal behavior ; Multimodal interaction ; Artificial intelligence ; Cognitive Robotics ; Social Signal Processing

Sažetak
In this paper, a computational reasoning framework that can interpret social signals of the person in interaction by focusing on the person’s emotional state is presented. Two distinct sources of social signals are used for this study: facial and voice emotion modalities. As a part of the first modality, a Convolutional Neural Network (CNN) is used to extract and process the facial features based on live stream video. The voice emotion analysis containing two sub-modalities is driven by CNN and Long Short-Term Memory (LSTM) networks. The networks are analyzing the acoustic and linguistic features of the voice to determine the possible emotional cues of the person in interaction. Relying on the multimodal information fusion, the system then fuses data into a single hypothesis. Results of such reasoning are used to autonomously generate the robot responses which are shown in a form of non-verbal facial animations projected on the ‘face’ surface of the affective robot head PLEA. Built-in functionalities of the robot can provide a degree of situational embodiment, self-explainability and context-driven interaction.

Izvorni jezik
Engleski

Znanstvena područja
Elektrotehnika, Računarstvo, Strojarstvo, Informacijske i komunikacijske znanosti

POVEZANOST RADA

Projekti:
HRZZ-UIP-2020-02-7184 - Afektivna multimodalna interakcija temeljena na konstruiranoj robotskoj spoznaji (AMICORC) (Stipančić, Tomislav, HRZZ - 2020-02) ( CroRIS)

Ustanove:
Fakultet strojarstva i brodogradnje, Zagreb

Profili:

Andrija Ričko (autor)