(1) Introduction to NVIDIA hardware and CUDA architecture
	
	Multiprocessors and memory hierarchy. Kernel, threads, blocks and grids, warps. Compute-capability features and limits, floating-point arithmetic, memory coalescing. Stream management.
	
	
	
	(2) Introduction to PGI CUDA Fortran and PGI accelerator
	
	Why Fortran, why CUDA Fortran. Hierarchy of CUDA Fortran, CUDA C and CUDA Runtime API. A CUDA Fortran source-code template. Kernel and device subroutines. Configuring kernel calls. Device, shared, constant and pinned memory declaration. Synchronization of threads. An alternative to CUDA Fortran: PGI accelerator directives.
	
	
	
	(3) Linear algebra and interpolation
	
	Compute- and memory-bound kernels. Simple linear algebra with CUDA Fortran and CULA library. Direct and iterative methods for linear algebraic equations. Linear and spline interpolation in one and more dimensions.
	
	
	
	(4) Initial-value problems for ordinary differential equations
	
	Runge-Kutta methods. Predictor-corrector methods. Implicit methods. Example of Lorenz-attractor solutions.
	
	
	
	(5) Explicit methods for evolutionary partial differential equations
	
	Heat equation in one, two and three dimensions. Spatial discretization: stencils of 2nd- and higher-order finite differences. Block and tiling implementations. Speedups for various compute capabilities.
	
	
	
	(6) More methods for more partial differential equations
	
	Fully implicit and Crank-Nicolson schemes. Method od lines. Alternating direction implicit method. Multigrid methods. Wave equation in one and more dimensions.
	
	
	
	(7) Technical issues
	
	CUDA-runtime API calls. Asynchronous streams and memory transfers. Pitfalls of inter-block synchronization. Interoperability of Fortran with C and CUDA C kernels. Running on GPU clusters with MPI calls.
	
	
	
	Ladislav ( Larry ) Hanyk
	
	Charles University Prague
	Faculty of Mathematics and Physics
	Department of Geophysics