
A practitioner's guide to distributing data, training, models, inference, coordination, and decision-making across many machines.
Modern AI is distributed AI. A single machine can no longer hold the data, the model, the inference traffic, or the fleet of agents that today's systems demand. This book is one connected journey from big-data algorithms to distributed intelligence, organized around six axes of distribution. It leads with scale-out, treats single-node efficiency as a clearly labeled per-node prerequisite, and builds every primitive, from the all-reduce collective to consensus, elastic recovery, and agent orchestration, through the AI operation that uses it, closing with end-to-end case studies and a capstone you design yourself.
Each part stands on the one before it; together they carry one system from a single split gradient to a swarm acting as one.
The vocabulary every later part reuses: what scale-out is, distributed-systems concepts, scalability and performance models, the communication primitives, and how to evaluate distributed AI with rigor.
5 chapters · 43 sections IIThe data layer that feeds everything: MapReduce and distributed algorithms, Spark and DataFrames, distributed storage and data loading, and stream processing for online AI.
4 chapters · 36 sections IIITraining distributed by hand: distributed optimization, parameter servers and terabyte embeddings, classical and graph ML at scale, and federated and decentralized learning.
5 chapters · 43 sections IVThe heart of large-model training: data, model, pipeline, sharded, and expert parallelism; elastic and fault-tolerant training; foundation models; distributed RL; and distributed HPO.
7 chapters · 62 sections VPer-node efficiency as a labeled prerequisite, then multiplied across the fleet: distributed inference systems, LLM serving with vLLM, distributed retrieval and vector search, and MLOps.
5 chapters · 44 sections VIDistributing the intelligence itself: distributed AI foundations, game theory, multi-agent reinforcement learning, swarm intelligence, and LLM agent orchestration.
6 chapters · 55 sections VIIThe substrate everything runs on, and how it stays alive: cluster infrastructure and scheduling, edge, fog, and on-device AI, and reliable, secure, privacy-preserving distributed AI.
3 chapters · 26 sections VIIIThe whole book assembled into systems: web-scale RAG, federated medical AI, distributed recommendation, multi-agent robotics, agentic LLM applications, and a capstone you design.
6 chapters · 57 sectionsFive habits, kept in every chapter from the first split gradient to the last agent.
Every concept built from first principles is paired with a small program that runs and prints real numbers, never an isolated snippet, so distribution is something you measure rather than assume.
After each from-scratch build, a shortcut callout shows the same task in a few lines of PySpark, PyTorch DDP and FSDP, DeepSpeed, Ray, or vLLM, and names exactly what the framework handles for you.
Big-picture framings, key insights, research frontiers, practical examples, and cross-references are typeset as distinct boxes, so you can read deep or skim fast and never miss a trap.
Each chapter closes with typed exercises and buildable project ideas that extend its worked systems, scaling from quick checks to capstone-sized distributed builds.
The MapReduce shuffle becomes all-reduce, parameter-server sharding becomes ZeRO and FSDP, data parallelism becomes expert parallelism, and per-node KV-cache economics return multiplied across the serving fleet. One story, told at every scale.
One method, four fields. Each volume builds its subject from first principles to the frontier, then makes you implement it.
Hands-On AI Science is a series of in-depth guides to the major fields of artificial intelligence. Every book goes deep into the theory, models, and internals, covering the classical foundations and the most recent ideas, then shows you how to build each one in Python with the modern libraries and tools that get the job done. The writing stays plain and light (illustrations, analogies, mental models, worked examples, and a little fun) without trading away rigor or coverage. Each volume is self-contained and complete enough to anchor a full course on its subject.
From Big Data Algorithms to Distributed Intelligence.
You are hereRead the full About the Hands-On AI Science Series note.