Pregled bibliografske jedinice broj: 1103440
Parallel solver for shifted systems
Parallel solver for shifted systems // PMAA16
Bordeaux, Francuska, 2016. str. 52-52 (predavanje, nije recenziran, sažetak, znanstveni)
CROSBI ID: 1103440 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Parallel solver for shifted systems
Autori
Bujanović, Zvonimir ; Bosner, Nela
Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, sažetak, znanstveni
Izvornik
PMAA16
/ - , 2016, 52-52
Skup
The 9th International Workshop on Parallel Matrix Algorithms and Applications
Mjesto i datum
Bordeaux, Francuska, 06.07.2016. - 08.07.2016
Vrsta sudjelovanja
Predavanje
Vrsta recenzije
Nije recenziran
Ključne riječi
GPU algorithms ; shifted systems ; m-Hessenberg reductions
Sažetak
We propose a combination of a hybrid CPU-GPU and a pure GPU algorithm for solving shifted linear systems with multiple right sides, for a large number of shifts. Such problems appear in control theory when evaluating the transfer function or as a part of an algorithm performing interpolatory model reduction, as well as when computing numerical solution of a large linear system of ODE's. The new algorithm for solving systems of the form $(A - \sigma I)X = B$, for many different $\sigma\in\mathbb{; ; C}; ; $ simultaneously, consists of two phases. In the first phase, we reduce the generally full system matrix $A \in \mathbb{; ; R}; ; ^{; ; n \times n}; ; $, and the full right-hand side matrix $B\in \mathbb{; ; R}; ; ^{; ; n \times m}; ; $, to a suitable form, which enables us to solve the systems with far less computational effort. This reduction is done only once, regardless of the number of shifts: $A$ is transformed to a so-called $m$-Hessenberg form and $B$ is made upper-triangular. For the first transformation, we introduce a highly parallel CPU-GPU hybrid algorithm. The algorithm is blocked ; individual blocks are being reduced by the CPU, and the necessary updates of the rest of the matrix are split among many cores of CPU and GPU. To enhance parallelization, the reduction and the update computation is overlapped. In the second phase, the reduced $m$-Hessenberg--triangular systems are repeatedly being solved for given batches of shifts. This solver is implemented entirely on the GPU, and it annihilates the $m$ subdiagonals of the system matrix simultaneously for all shifts in the batch. The most demanding part of this algorithm are the RQ factorizations of many $m$-Hessenberg matrices independently. Hence, each factorization is run by a different block of threads, while the updates mostly rely on cuBLAS routines. Benefits of such load distribution are demonstrated by numerical experiments: on our platform, both parallel algorithms outperform their CPU-bound counterparts by the factor of $3.5$ for larger dimensions.
Izvorni jezik
Engleski
Znanstvena područja
Matematika, Računarstvo
POVEZANOST RADA
Projekti:
HRZZ-IP-2013-11-9345 - Matematičko modeliranje, analiza i računanje s primjenama na kompleksne mehaničke sustave (MMACACMS) (Drmač, Zlatko, HRZZ - 2013-11) ( CroRIS)
Ustanove:
Prirodoslovno-matematički fakultet, Matematički odjel, Zagreb,
Prirodoslovno-matematički fakultet, Zagreb