"The chapters told me what to do and why it works. When I actually needed the gradient identity, a cluster with four GPUs, the meaning of a symbol, and a dataset to point all of it at, I came back here. The back matter is where the book stops explaining and starts handing me tools."
A Reader Returning for the Tools, Not the Theory
The four appendices are not afterthoughts; they are the working infrastructure that lets the forty-one chapters stay focused on distribution by holding the shared dependencies in one place. A book that distributes AI across many machines rests on a small set of things every chapter needs and none can afford to re-derive each time: the linear algebra, probability, optimization, and information theory the methods are built from; a concrete cluster the runnable labs and the capstone actually execute on; a single authoritative table for the notation and terms the chapters reuse on every page; and a curated set of datasets, benchmarks, and tools to point a project at. Pulling these into the back matter keeps each chapter from carrying its own math refresher, its own setup guide, and its own glossary, and it gives you one stable place to return to whenever a symbol, a derivation, a command, or a benchmark is the thing standing between you and the next paragraph. Read the chapters in order; reach for the appendices the moment a chapter assumes something you want made explicit.
Overview
The eight parts of this book teach a single move performed in many forms: split an AI workload across machines, move the necessary information between them, and recombine it correctly while keeping the cost of that movement under control. To stay on that thread, the chapters lean on background they state but do not always pause to develop. The back matter develops it. Four appendices collect the recurring dependencies of the whole book so that any chapter can assume them by reference rather than rebuilding them, and so that you have one durable place to consult when a prerequisite is the thing you actually need.
The appendices fall into two pairs. The first pair supplies the foundations a method rests on: Appendix A is the mathematical background, the linear algebra, probability and statistics, convex optimization and gradient methods, and information theory essentials that the optimization and parallel-training chapters use without re-deriving; Appendix B is the companion cluster lab, the setup guide that turns the book's code from something you read into something you run, from a local multi-process job on one laptop up to a real multi-node cluster. The second pair supplies the references a project needs: Appendix C is the notation and glossary, the single authoritative list of every symbol and term the chapters reuse; Appendix D is the catalogue of datasets, benchmarks, and resources to anchor an experiment or a capstone. Together they are the standing apparatus the chapters were written against: the math you can look up, the cluster you can run on, the words you can rely on, and the data you can measure against.
If you keep one thing about the appendices, keep this: they are written to be consulted, not read front to back, and each one answers a different question the chapters raise but do not stop to settle. When a derivation in Parts III or IV moves faster than your memory of the underlying mathematics, open Appendix A and read only the section you need. When a chapter shows code and you want to execute it, Appendix B is the one place that takes you from a single process to a multi-node run with the same example. When a symbol or a term is ambiguous, Appendix C fixes its meaning once for the whole book. When a project needs something real to scale, Appendix D points at the datasets and benchmarks that make a measured claim possible. The roadmap below names each appendix and the chapters it most directly supports.
The Four Appendices
- A Mathematical Background The self-contained refresher the methods rest on: linear algebra (vectors, matrices, norms, decompositions), probability and statistics, convex optimization and gradient methods, and the information-theory essentials that the optimization and parallel-training chapters of Parts III and IV use without re-deriving them in place.
- B The Companion Cluster Lab: Setup Guide The hands-on setup guide that takes the book's code from reading to running: local multi-process on one machine, single- and multi-GPU on one box, multi-node training with NCCL, torchrun, and Slurm, cloud and spot provisioning, and a reproducibility checklist so a result survives a second pair of hands.
- C Notation and Glossary The single authoritative reference for the symbols and terms the chapters reuse on every page: the notation conventions for vectors, matrices, gradients, and collectives, and a glossary of the distributed-AI vocabulary from all-reduce and sharding to staleness and stragglers.
- D Datasets, Benchmarks, and Resources A curated catalogue of the datasets, benchmarks, and tools a project can point itself at: corpora and model suites large enough to expose a real single-machine ceiling, the standard scaling and serving benchmarks, and the open frameworks and references that anchor a credible capstone.
How Each Appendix Supports the Chapters
Each appendix is tied to a specific stretch of the book, and reading it alongside that stretch is the intended use. Appendix A supports the mathematics of Parts III and IV most directly: the gradient identity that opens Chapter 1, the optimization analysis of distributed SGD, and the linear algebra behind model and sharded parallelism all assume the refresher it provides, which is why Chapter 1 already invites you to read it alongside the first chapter rather than before it. Appendix B supports the runnable labs throughout the book and the capstone in particular: it is the cluster the code examples assume and the environment the capstone of Chapter 41 provisions when it asks you to build and measure a distributed system of your own. Appendix C is the notation reference the whole book writes against, the place that fixes once what every chapter then uses consistently. Appendix D points the projects and exercises at datasets and benchmarks substantial enough to expose a true ceiling, the raw material a capstone needs to turn an ambition to scale into a measured result.
This is the back matter, so there is no next chapter to read; the chapters are behind you and the tools are in front of you. If the appendices send you anywhere, they send you back to the two ends of the arc. They send you back to the capstone of Chapter 41, the place the whole book was built toward, where the math of Appendix A, the cluster of Appendix B, the notation of Appendix C, and the datasets of Appendix D all converge on a single measured, reproducible, defensible result you propose and run yourself. And they send you back to the thesis of Chapter 1, the claim these four references were assembled to serve: modern AI is a distributed system whose data, computation, models, inference, and decision-making are spread across many machines that must communicate and coordinate to act as one. The chapters taught that thesis; the appendices equip you to act on it. Pick the bottleneck that will not fit, look up what you need, and scale it out.