Part II — The Parallel Execution Layer

This part focuses on the mechanisms used to execute parallel programs on supercomputing systems. Through classical programming models such as MPI and CUDA, it introduces the fundamental concepts of parallel execution, communication, synchronization, and acceleration.

Part II serves as a natural continuation of the infrastructure concepts introduced earlier, but it can also be approached selectively. Readers whose primary interest lies in deep learning frameworks may treat this part as contextual background, while those with a strong HPC orientation will find it essential for understanding performance behavior and execution models used later in the book.

4. Launching and Structuring Parallel Programs
5. GPU Programming and CUDA
6. Distributed GPU Programming

Part II — The Parallel Execution Layer

Table of contents