Parallel solver for shifted systems

Bujanović, Zvonimir; Bosner, Nela

izvor podataka: crosbi ✓

Parallel solver for shifted systems (CROSBI ID 698522)

Prilog sa skupa u zborniku | sažetak izlaganja sa skupa

Bujanović, Zvonimir ; Bosner, Nela Parallel solver for shifted systems // PMAA16. 2016. str. 52-52

Podaci o odgovornosti

Autori

Bujanović, Zvonimir ; Bosner, Nela

Osnovni podaci na izvornom jeziku
Osnovni podaci na ostalim jezicima

Jezik

engleski

Naslov

Parallel solver for shifted systems

Sažetak

We propose a combination of a hybrid CPU-GPU and a pure GPU algorithm for solving shifted linear systems with multiple right sides, for a large number of shifts. Such problems appear in control theory when evaluating the transfer function or as a part of an algorithm performing interpolatory model reduction, as well as when computing numerical solution of a large linear system of ODE's. The new algorithm for solving systems of the form $(A - \sigma I)X = B$, for many different $\sigma\in\mathbb{; ; C}; ; $ simultaneously, consists of two phases. In the first phase, we reduce the generally full system matrix $A \in \mathbb{; ; R}; ; ^{; ; n \times n}; ; $, and the full right-hand side matrix $B\in \mathbb{; ; R}; ; ^{; ; n \times m}; ; $, to a suitable form, which enables us to solve the systems with far less computational effort. This reduction is done only once, regardless of the number of shifts: $A$ is transformed to a so-called $m$-Hessenberg form and $B$ is made upper-triangular. For the first transformation, we introduce a highly parallel CPU-GPU hybrid algorithm. The algorithm is blocked ; individual blocks are being reduced by the CPU, and the necessary updates of the rest of the matrix are split among many cores of CPU and GPU. To enhance parallelization, the reduction and the update computation is overlapped. In the second phase, the reduced $m$-Hessenberg--triangular systems are repeatedly being solved for given batches of shifts. This solver is implemented entirely on the GPU, and it annihilates the $m$ subdiagonals of the system matrix simultaneously for all shifts in the batch. The most demanding part of this algorithm are the RQ factorizations of many $m$-Hessenberg matrices independently. Hence, each factorization is run by a different block of threads, while the updates mostly rely on cuBLAS routines. Benefits of such load distribution are demonstrated by numerical experiments: on our platform, both parallel algorithms outperform their CPU-bound counterparts by the factor of $3.5$ for larger dimensions.

Ključne riječi

GPU algorithms ; shifted systems ; m-Hessenberg reductions

Napomena

nije evidentirano

Jezik

nije evidentirano

Naslov

nije evidentirano

Sažetak

nije evidentirano

Ključne riječi

nije evidentirano

Napomena

nije evidentirano

Podaci o prilogu

Stranice rada

52-52.

Godina izdavanja

2016.

Status objave rada

objavljeno

Podaci o matičnoj publikaciji

Naslov

PMAA16

Podaci o skupu

Skup

The 9th International Workshop on Parallel Matrix Algorithms and Applications

Vrsta sudjelovanja

predavanje

Datum održavanja skupa

06.07.2016-08.07.2016

Mjesto održavanja skupa

Bordeaux, Francuska

Povezanost rada

Povezane osobe

Zvonimir Bujanović (autor/i)

Nela Bosner (autor/i)

Povezane ustanove

Prirodoslovno-matematički fakultet, Matematički odjel, Zagreb (037) (autorova ustanova)

Prirodoslovno-matematički fakultet, Zagreb (119) (autorova ustanova)

Povezani projekti

Matematičko modeliranje, analiza i računanje s primjenama na kompleksne mehaničke sustave (rezultat rada na projektu)

Područje

Matematika, Računarstvo

Poveznice

pmaa16.sciencesconf.org