Solving partial differential equations with PGI CUDA Fortran

  • Published: 2014-12-08
  • 6070

(1) Introduction to NVIDIA hardware and CUDA architecture

Multiprocessors and memory hierarchy. Kernel, threads, blocks and grids, warps. Compute-capability features and limits, floating-point arithmetic, memory coalescing. Stream management.



(2) Introduction to PGI CUDA Fortran and PGI accelerator

Why Fortran, why CUDA Fortran. Hierarchy of CUDA Fortran, CUDA C and CUDA Runtime API. A CUDA Fortran source-code template. Kernel and device subroutines. Configuring kernel calls. Device, shared, constant and pinned memory declaration. Synchronization of threads. An alternative to CUDA Fortran: PGI accelerator directives.



(3) Linear algebra and interpolation

Compute- and memory-bound kernels. Simple linear algebra with CUDA Fortran and CULA library. Direct and iterative methods for linear algebraic equations. Linear and spline interpolation in one and more dimensions.



(4) Initial-value problems for ordinary differential equations

Runge-Kutta methods. Predictor-corrector methods. Implicit methods. Example of Lorenz-attractor solutions.



(5) Explicit methods for evolutionary partial differential equations

Heat equation in one, two and three dimensions. Spatial discretization: stencils of 2nd- and higher-order finite differences. Block and tiling implementations. Speedups for various compute capabilities.



(6) More methods for more partial differential equations

Fully implicit and Crank-Nicolson schemes. Method od lines. Alternating direction implicit method. Multigrid methods. Wave equation in one and more dimensions.



(7) Technical issues

CUDA-runtime API calls. Asynchronous streams and memory transfers. Pitfalls of inter-block synchronization. Interoperability of Fortran with C and CUDA C kernels. Running on GPU clusters with MPI calls.



Ladislav ( Larry ) Hanyk

Charles University Prague
Faculty of Mathematics and Physics
Department of Geophysics