Part IV — The Scalability Layer

This part addresses one of the central challenges of modern AI systems: scaling training efficiently across multiple GPUs and nodes. Through hands-on case studies using TensorFlow and PyTorch, it explores different forms of parallelism, distributed execution strategies, and performance trade-offs.

Part IV represents a key convergence point in the book, bringing together infrastructure knowledge, execution models, and AI frameworks. It can be read as a core component of courses focused on scalable deep learning, even when earlier parts are only partially covered.

10. Introduction to Parallel Training of Neural Networks
11. Practical Guide to Efficient Training with PyTorch
12. Parallelizing Model Training with Distributed Data Parallel

Part IV — The Scalability Layer

Table of contents