Person localization model based on a fusion of acoustic and visual inputs

Koren, Leon; Stipancic, Tomislav; Ricko, Andrija; Orsag, Luka

doi:10.3390/electronics11030440

Pregled bibliografske jedinice broj: 1175659

Person localization model based on a fusion of acoustic and visual inputs

Koren, Leon; Stipancic, Tomislav; Ricko, Andrija; Orsag, Luka

Person localization model based on a fusion of acoustic and visual inputs // Electronics (Basel), 11 (2022), 3; 440, 13 doi:10.3390/electronics11030440 (međunarodna recenzija, članak, znanstveni)

CROSBI ID: 1175659 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Person localization model based on a fusion of acoustic and visual inputs

Autori
Koren, Leon ; Stipancic, Tomislav ; Ricko, Andrija ; Orsag, Luka

Izvornik
Electronics (Basel) (2079-9292) 11 (2022), 3; 440, 13

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
spatial location ; residual neural network ; digital filter ; person separation ; cognitive robotics ; multimodal signal processing ; sensors ; HRI

Sažetak
PLEA is an interactive, biomimetic robotic head with non-verbal communication capabilities. PLEA reasoning is based on a multimodal approach combining video and audio inputs to reason about the current emotional state of the person. PLEA expresses emotions using facial expressions generated in real-time and projected onto the 3D projection face surface. In this paper, a more sophisticated computation mechanism is developed and evaluated in this paper. The Model for Audio-Visual Person Separation can locate a talking person in a crowded place by combining the input from the ResNet network with the input from a hand-crafted algorithm. While the first input is used to find human faces in the room, the second input is used to determine the direction of the sound and to focus attention on a single person. After an information fusion procedure is performed, the face of the person speaking is matched with the corresponding sound direction. As a result of this procedure, the robot can start an interaction with the person based on non-verbal signals. The model is tested and evaluated under laboratory conditions in interaction with users. The results suggest that the methodology can be efficiently used to focus a robot’s attention on the localized person.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo, Strojarstvo, Interdisciplinarne tehničke znanosti

POVEZANOST RADA

Projekti:
HRZZ-UIP-2020-02-7184 - Afektivna multimodalna interakcija temeljena na konstruiranoj robotskoj spoznaji (AMICORC) (Stipančić, Tomislav, HRZZ - 2020-02) ( CroRIS)

Ustanove:
Fakultet strojarstva i brodogradnje, Zagreb

Profili:

Andrija Ričko (autor)