Convolutional architecture for efficient semantic segmentation of large images

Ivan Krešo

izvor podataka: crosbi !

Convolutional architecture for efficient semantic segmentation of large images (CROSBI ID 442804)

Ocjenski rad | doktorska disertacija

Ivan Krešo Convolutional architecture for efficient semantic segmentation of large images / Siniša Šegvić (mentor); Zagreb, Fakultet elektrotehnike i računarstva, . 2021

Podaci o odgovornosti

Autori

Ivan Krešo

Mentori

Siniša Šegvić

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

Convolutional architecture for efficient semantic segmentation of large images

Sažetak

This thesis investigates the semantic segmentation of large natural images. We focus on the type of images that are recorded with a camera mounted on a vehicle. These kinds of images do not suffer from the photographer bias and therefore usually contain harder examples to generalize to. For the task of semantic segmentation, this means that the objects appear on a wide range of scales. Hence, it is important that the method works well both for small objects further away and large objects near the camera. The focus of the thesis is on applying convolutional neural networks for semantic segmentation of large images in an efficient manner. The thesis starts by providing an introduction to the problem of semantic segmentation and explaining its relation with respect to the problem of object localization. The introduction is concluded by discussing the challenges of the problem. The next chapter starts by introducing the required concepts from the field of machine learning. In particular, we review the convolutional neural networks and make a comparison between DenseNet and ResNet architecture. The main contributions of the thesis are as follows. First, we develop a scale-invariant convolutional model for semantic segmentation which alleviates the problem of learning the same object on a wide range of scales. Furthermore, we additionally contribute a new dataset for semantic segmentation of driving scenes. The dataset contains groundtruth semantic segmentations for 445 annotated hand-picked images from the KITTI dataset. Second, we develop an efficient asymmetric architecture for dense prediction on large images based on a densely connected feature extractor and lightweight ladder-style upsampling. Third, we present the results from the Robust Vision Challenge 2018 where we achieved the second place. The challenge addressed cross-dataset and cross-domain training of dense prediction models. We have found out that mixed batches ensure the most stable evolution of batchnorm parameters. Low incidence of foreign classes indicates that our models succeeded to implicitly learn to distinguish the domains. Finally, the presented architecture is computationally and memory efficient and achieves a great tradeoff between inference speed and generalization accuracy.

Ključne riječi

computer vision, semantic segmentation

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o izdanju

Broj stranica

104

Datum obrane

01.07.2021.

Status objave rada

obranjeno

Podaci o ustanovi koja je dodijelila akademski stupanj

Ustanova / Organizacija

Fakultet elektrotehnike i računarstva

Mjesto

Zagreb

Povezanost rada

Povezane osobe

Ivan Krešo (autor/i)

Siniša Šegvić (mentor/i)

Povezane ustanove

Fakultet elektrotehnike i računarstva (036) (autorova ustanova)

Povezani projekti

Napredna gusta predikcija za računalni vid (rezultat rada na projektu)

Područje

Računarstvo

Poveznice

zemris.fer.hr