Parallel solver for shifted systems (CROSBI ID 698522)
Prilog sa skupa u zborniku | sažetak izlaganja sa skupa
Podaci o odgovornosti
Bujanović, Zvonimir ; Bosner, Nela
engleski
Parallel solver for shifted systems
We propose a combination of a hybrid CPU-GPU and a pure GPU algorithm for solving shifted linear systems with multiple right sides, for a large number of shifts. Such problems appear in control theory when evaluating the transfer function or as a part of an algorithm performing interpolatory model reduction, as well as when computing numerical solution of a large linear system of ODE's. The new algorithm for solving systems of the form $(A - \sigma I)X = B$, for many different $\sigma\in\mathbb{; ; C}; ; $ simultaneously, consists of two phases. In the first phase, we reduce the generally full system matrix $A \in \mathbb{; ; R}; ; ^{; ; n \times n}; ; $, and the full right-hand side matrix $B\in \mathbb{; ; R}; ; ^{; ; n \times m}; ; $, to a suitable form, which enables us to solve the systems with far less computational effort. This reduction is done only once, regardless of the number of shifts: $A$ is transformed to a so-called $m$-Hessenberg form and $B$ is made upper-triangular. For the first transformation, we introduce a highly parallel CPU-GPU hybrid algorithm. The algorithm is blocked ; individual blocks are being reduced by the CPU, and the necessary updates of the rest of the matrix are split among many cores of CPU and GPU. To enhance parallelization, the reduction and the update computation is overlapped. In the second phase, the reduced $m$-Hessenberg--triangular systems are repeatedly being solved for given batches of shifts. This solver is implemented entirely on the GPU, and it annihilates the $m$ subdiagonals of the system matrix simultaneously for all shifts in the batch. The most demanding part of this algorithm are the RQ factorizations of many $m$-Hessenberg matrices independently. Hence, each factorization is run by a different block of threads, while the updates mostly rely on cuBLAS routines. Benefits of such load distribution are demonstrated by numerical experiments: on our platform, both parallel algorithms outperform their CPU-bound counterparts by the factor of $3.5$ for larger dimensions.
GPU algorithms ; shifted systems ; m-Hessenberg reductions
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
52-52.
2016.
objavljeno
Podaci o matičnoj publikaciji
PMAA16
Podaci o skupu
The 9th International Workshop on Parallel Matrix Algorithms and Applications
predavanje
06.07.2016-08.07.2016
Bordeaux, Francuska