Konvolucijska arhitektura za učinkovitu semantičku segmentaciju velikih slika

Ivan Krešo

Pregled bibliografske jedinice broj: 1142403

Konvolucijska arhitektura za učinkovitu semantičku segmentaciju velikih slika

Ivan Krešo

Konvolucijska arhitektura za učinkovitu semantičku segmentaciju velikih slika, 2021., doktorska disertacija, Fakultet elektrotehnike i računarstva, Zagreb

CROSBI ID: 1142403 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Konvolucijska arhitektura za učinkovitu semantičku segmentaciju velikih slika
(Convolutional architecture for efficient semantic segmentation of large images)

Autori
Ivan Krešo

Vrsta, podvrsta i kategorija rada
Ocjenski radovi, doktorska disertacija

Fakultet
Fakultet elektrotehnike i računarstva

Mjesto
Zagreb

Datum
01.07

Godina
2021

Stranica
104

Mentor
Siniša Šegvić

Ključne riječi
računalni vid, semantička segmentacija
(computer vision, semantic segmentation)

Sažetak
This thesis investigates the semantic segmentation of large natural images. We focus on the type of images that are recorded with a camera mounted on a vehicle. These kinds of images do not suffer from the photographer bias and therefore usually contain harder examples to generalize to. For the task of semantic segmentation, this means that the objects appear on a wide range of scales. Hence, it is important that the method works well both for small objects further away and large objects near the camera. The focus of the thesis is on applying convolutional neural networks for semantic segmentation of large images in an efficient manner. The thesis starts by providing an introduction to the problem of semantic segmentation and explaining its relation with respect to the problem of object localization. The introduction is concluded by discussing the challenges of the problem. The next chapter starts by introducing the required concepts from the field of machine learning. In particular, we review the convolutional neural networks and make a comparison between DenseNet and ResNet architecture. The main contributions of the thesis are as follows. First, we develop a scale-invariant convolutional model for semantic segmentation which alleviates the problem of learning the same object on a wide range of scales. Furthermore, we additionally contribute a new dataset for semantic segmentation of driving scenes. The dataset contains groundtruth semantic segmentations for 445 annotated hand-picked images from the KITTI dataset. Second, we develop an efficient asymmetric architecture for dense prediction on large images based on a densely connected feature extractor and lightweight ladder-style upsampling. Third, we present the results from the Robust Vision Challenge 2018 where we achieved the second place. The challenge addressed cross-dataset and cross-domain training of dense prediction models. We have found out that mixed batches ensure the most stable evolution of batchnorm parameters. Low incidence of foreign classes indicates that our models succeeded to implicitly learn to distinguish the domains. Finally, the presented architecture is computationally and memory efficient and achieves a great tradeoff between inference speed and generalization accuracy.

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo

POVEZANOST RADA

Projekti:
--IP-2020-02-5851 - Napredna gusta predikcija za računalni vid (ADEPT) (Šegvić, Siniša) ( CroRIS)

Ustanove:
Fakultet elektrotehnike i računarstva, Zagreb

Profili:

Ivan Krešo (autor)