(1) Introduction to NVIDIA hardware and CUDA architecture
Multiprocessors and memory hierarchy. Kernel, threads, blocks and grids, warps. Compute-capability features and limits, floating-point arithmetic, memory coalescing. Stream management.
(2) Introduction to PGI CUDA Fortran and PGI accelerator
Why Fortran, why CUDA Fortran. Hierarchy of CUDA Fortran, CUDA C and CUDA Runtime API. A CUDA Fortran source-code template. Kernel and device subroutines. Configuring kernel calls. Device, shared, constant and pinned memory declaration. Synchronization of threads. An alternative to CUDA Fortran: PGI accelerator directives.
(3) Linear algebra and interpolation
Compute- and memory-bound kernels. Simple linear algebra with CUDA Fortran and CULA library. Direct and iterative methods for linear algebraic equations. Linear and spline interpolation in one and more dimensions.
(4) Initial-value problems for ordinary differential equations
Runge-Kutta methods. Predictor-corrector methods. Implicit methods. Example of Lorenz-attractor solutions.
(5) Explicit methods for evolutionary partial differential equations
Heat equation in one, two and three dimensions. Spatial discretization: stencils of 2nd- and higher-order finite differences. Block and tiling implementations. Speedups for various compute capabilities.
(6) More methods for more partial differential equations
Fully implicit and Crank-Nicolson schemes. Method od lines. Alternating direction implicit method. Multigrid methods. Wave equation in one and more dimensions.
(7) Technical issues
CUDA-runtime API calls. Asynchronous streams and memory transfers. Pitfalls of inter-block synchronization. Interoperability of Fortran with C and CUDA C kernels. Running on GPU clusters with MPI calls.
Ladislav ( Larry ) Hanyk
Charles University Prague
Faculty of Mathematics and Physics
Department of Geophysics