Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi

Parallel solver for shifted systems (CROSBI ID 698522)

Prilog sa skupa u zborniku | sažetak izlaganja sa skupa

Bujanović, Zvonimir ; Bosner, Nela Parallel solver for shifted systems // PMAA16. 2016. str. 52-52

Podaci o odgovornosti

Bujanović, Zvonimir ; Bosner, Nela

engleski

Parallel solver for shifted systems

We propose a combination of a hybrid CPU-GPU and a pure GPU algorithm for solving shifted linear systems with multiple right sides, for a large number of shifts. Such problems appear in control theory when evaluating the transfer function or as a part of an algorithm performing interpolatory model reduction, as well as when computing numerical solution of a large linear system of ODE's. The new algorithm for solving systems of the form $(A - \sigma I)X = B$, for many different $\sigma\in\mathbb{; ; C}; ; $ simultaneously, consists of two phases. In the first phase, we reduce the generally full system matrix $A \in \mathbb{; ; R}; ; ^{; ; n \times n}; ; $, and the full right-hand side matrix $B\in \mathbb{; ; R}; ; ^{; ; n \times m}; ; $, to a suitable form, which enables us to solve the systems with far less computational effort. This reduction is done only once, regardless of the number of shifts: $A$ is transformed to a so-called $m$-Hessenberg form and $B$ is made upper-triangular. For the first transformation, we introduce a highly parallel CPU-GPU hybrid algorithm. The algorithm is blocked ; individual blocks are being reduced by the CPU, and the necessary updates of the rest of the matrix are split among many cores of CPU and GPU. To enhance parallelization, the reduction and the update computation is overlapped. In the second phase, the reduced $m$-Hessenberg--triangular systems are repeatedly being solved for given batches of shifts. This solver is implemented entirely on the GPU, and it annihilates the $m$ subdiagonals of the system matrix simultaneously for all shifts in the batch. The most demanding part of this algorithm are the RQ factorizations of many $m$-Hessenberg matrices independently. Hence, each factorization is run by a different block of threads, while the updates mostly rely on cuBLAS routines. Benefits of such load distribution are demonstrated by numerical experiments: on our platform, both parallel algorithms outperform their CPU-bound counterparts by the factor of $3.5$ for larger dimensions.

GPU algorithms ; shifted systems ; m-Hessenberg reductions

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o prilogu

52-52.

2016.

objavljeno

Podaci o matičnoj publikaciji

PMAA16

Podaci o skupu

The 9th International Workshop on Parallel Matrix Algorithms and Applications

predavanje

06.07.2016-08.07.2016

Bordeaux, Francuska

Povezanost rada

Matematika, Računarstvo