Parallel solver for shifted systems

Bujanović, Zvonimir; Bosner, Nela

Pregled bibliografske jedinice broj: 1103440

Parallel solver for shifted systems

Bujanović, Zvonimir; Bosner, Nela

Parallel solver for shifted systems // PMAA16
Bordeaux, Francuska, 2016. str. 52-52 (predavanje, nije recenziran, sažetak, znanstveni)

CROSBI ID: 1103440 Za ispravke kontaktirajte CROSBI podršku putem web obrasca

Naslov
Parallel solver for shifted systems

Autori
Bujanović, Zvonimir ; Bosner, Nela

Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, sažetak, znanstveni

Izvornik
PMAA16 / - , 2016, 52-52

Skup
The 9th International Workshop on Parallel Matrix Algorithms and Applications

Mjesto i datum
Bordeaux, Francuska, 06.07.2016. - 08.07.2016

Vrsta sudjelovanja
Predavanje

Vrsta recenzije
Nije recenziran

Ključne riječi
GPU algorithms ; shifted systems ; m-Hessenberg reductions

Sažetak
We propose a combination of a hybrid CPU-GPU and a pure GPU algorithm for solving shifted linear systems with multiple right sides, for a large number of shifts. Such problems appear in control theory when evaluating the transfer function or as a part of an algorithm performing interpolatory model reduction, as well as when computing numerical solution of a large linear system of ODE's. The new algorithm for solving systems of the form $(A - \sigma I)X = B$, for many different $\sigma\in\mathbb{; ; C}; ; $ simultaneously, consists of two phases. In the first phase, we reduce the generally full system matrix $A \in \mathbb{; ; R}; ; ^{; ; n \times n}; ; $, and the full right-hand side matrix $B\in \mathbb{; ; R}; ; ^{; ; n \times m}; ; $, to a suitable form, which enables us to solve the systems with far less computational effort. This reduction is done only once, regardless of the number of shifts: $A$ is transformed to a so-called $m$-Hessenberg form and $B$ is made upper-triangular. For the first transformation, we introduce a highly parallel CPU-GPU hybrid algorithm. The algorithm is blocked ; individual blocks are being reduced by the CPU, and the necessary updates of the rest of the matrix are split among many cores of CPU and GPU. To enhance parallelization, the reduction and the update computation is overlapped. In the second phase, the reduced $m$-Hessenberg--triangular systems are repeatedly being solved for given batches of shifts. This solver is implemented entirely on the GPU, and it annihilates the $m$ subdiagonals of the system matrix simultaneously for all shifts in the batch. The most demanding part of this algorithm are the RQ factorizations of many $m$-Hessenberg matrices independently. Hence, each factorization is run by a different block of threads, while the updates mostly rely on cuBLAS routines. Benefits of such load distribution are demonstrated by numerical experiments: on our platform, both parallel algorithms outperform their CPU-bound counterparts by the factor of $3.5$ for larger dimensions.

Izvorni jezik
Engleski

Znanstvena područja
Matematika, Računarstvo

POVEZANOST RADA

Projekti:
HRZZ-IP-2013-11-9345 - Matematičko modeliranje, analiza i računanje s primjenama na kompleksne mehaničke sustave (MMACACMS) (Drmač, Zlatko, HRZZ - 2013-11) ( CroRIS)

Ustanove:
Prirodoslovno-matematički fakultet, Matematički odjel, Zagreb,
Prirodoslovno-matematički fakultet, Zagreb

Profili:

Nela Bosner (autor)