Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi !

Design of performance optimized transform and quantization computation blocks for video compression in heterogeneous high performance computing systems (CROSBI ID 434212)

Ocjenski rad | doktorska disertacija

Čobrnić, Mate Design of performance optimized transform and quantization computation blocks for video compression in heterogeneous high performance computing systems / Mario Kovač (mentor); Zagreb, Fakultet elektrotehnike i računarstva, . 2020

Podaci o odgovornosti

Čobrnić, Mate

Mario Kovač

engleski

Design of performance optimized transform and quantization computation blocks for video compression in heterogeneous high performance computing systems

When analysing Internet traffic today it can be found that digital video content prevails. Its domination will continue to grow in the upcoming years and reach 80% of all traffic by 2021. If converted to Internet video minutes per second, this equals about one million video minutes per second. Providing and supporting improved compression capability is therefore expected from video processing devices. This will relieve the pressure on storage systems and communication networks while creating preconditions for further development of video services. Transform and quantization is one of the most compute-intensive parts of modern hybrid video coding systems. Improving the compression capability of this computation block is achieved using complex algorithms at the expense of increasing implementation complexity. Design requirements for higher throughput, reduced communication latency and low power consumption cannot be accomplished using homogenous systems and heterogeneous multiprocessor high performance systems are imposed as a solution. This thesis presents an area efficient reusable architecture for the integer discrete cosine transform and quantization and also highly performance optimized kernel designed for execution on a GPU. In the case of hardware architecture, optimization is based on exploiting the symmetry and subset properties of the transform matrix. The proposed multiply-accumulate architecture is fully pipelined. It provides a two-way interface over which the processing system can control the data path of the transform process and receive the feedback information about utilization from the device. The proposed architecture is implemented on the FPGA platform, that achieves a throughput of 815 Msps and can support encoding of a 4K UHD@30 fps video sequence in real-time. Considering GPU implementation, the performance optimization strategy involved all three aspects of parallel design, exposing as much of the algorithm’s intrinsic parallelism as possible, with the exploitation of high throughput memory and efficient instruction usage. It combined efficient mapping of transform blocks to thread blocks and efficient vectorized access patterns to shared memory for all transform sizes. Two different GPUs were used to evaluate the proposed implementation. Speedup factors compared to CPU, cuBLAS and AVX2 implementations are up to 80, 19 and 4 times respectively.

Video Coding ; High Efficiency Video Coding (HEVC) ; Integer Discrete Cosine Transform (DCT) ; Heterogeneous Computing ; Hardware Acceleration

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o izdanju

107

19.06.2020.

obranjeno

Podaci o ustanovi koja je dodijelila akademski stupanj

Fakultet elektrotehnike i računarstva

Zagreb

Povezanost rada

Računarstvo