In Defense of Pre-Trained ImageNet Architectures for Real-Time Semantic Segmentation of Road-Driving Images

Oršić, Marin; Krešo, Ivan; Bevandić, Petra; Šegvić, Siniša

Pregled bibliografske jedinice broj: 1042694

In Defense of Pre-Trained ImageNet Architectures for Real-Time Semantic Segmentation of Road-Driving Images

Oršić, Marin; Krešo, Ivan; Bevandić, Petra; Šegvić, Siniša

In Defense of Pre-Trained ImageNet Architectures for Real-Time Semantic Segmentation of Road-Driving Images // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Long Beach (CA), Sjedinjene Američke Države: Institute of Electrical and Electronics Engineers (IEEE), 2019. str. 12607-12616 doi:10.1109/cvpr.2019.01289 (poster, međunarodna recenzija, cjeloviti rad (in extenso), znanstveni)

CROSBI ID: 1042694 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
In Defense of Pre-Trained ImageNet Architectures for Real-Time Semantic Segmentation of Road-Driving Images

Autori
Oršić, Marin ; Krešo, Ivan ; Bevandić, Petra ; Šegvić, Siniša

Vrsta, podvrsta i kategorija rada
Radovi u zbornicima skupova, cjeloviti rad (in extenso), znanstveni

Izvornik
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) / - : Institute of Electrical and Electronics Engineers (IEEE), 2019, 12607-12616

ISBN
978-1-7281-3293-8

Skup
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Mjesto i datum
Long Beach (CA), Sjedinjene Američke Države, 15.06.2019. - 20.06.2019

Vrsta sudjelovanja
Poster

Vrsta recenzije
Međunarodna recenzija

Ključne riječi
Computer vision

Sažetak
Recent success of semantic segmentation approaches on demanding road driving datasets has spurred interest in many related application fields. Many of these applications involve real-time prediction on mobile platforms such as cars, drones and various kinds of robots. Real-time setup is challenging due to extraordinary computational complexity involved. Many previous works address the challenge with custom lightweight architectures which decrease computational complexity by reducing depth, width and layer capacity with respect to general purpose architectures. We propose an alternative approach which achieves a significantly better performance across a wide range of computing budgets. First, we rely on a light-weight general purpose architecture as the main recognition engine. Then, we leverage light-weight upsampling with lateral connections as the most cost-effective solution to restore the prediction resolution. Finally, we propose to enlarge the receptive field by fusing shared features at multiple resolutions in a novel fashion. Experiments on several road driving datasets show a substantial advantage of the proposed approach, either with ImageNet pre-trained parameters or when we learn from scratch. Our Cityscapes test submission entitled SwiftNetRN-18 delivers 75.5% MIoU and achieves 39.9 Hz on 1024x2048 images on GTX1080Ti.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo

POVEZANOST RADA

Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb

Profili:

Ivan Krešo (autor)