"For six parts you treated me as a given: a pool of machines that always answered. I am the scheduler. I decide which of your jobs runs, which one waits, and which one I quietly evict at three in the morning to make room for a bigger tenant."
A Cluster Scheduler That Has Read All Six Previous Parts
Every method in the first six parts assumed a substrate it never described: a pool of machines that could be acquired, placed near data, kept running through failure, and trusted to compute correct results. Part VII is that substrate. Data parallelism assumed workers existed; model parallelism assumed they were interconnected; serving assumed a fleet stayed up; multi-agent systems assumed agents could reach one another. None of that is free. Machines must be requested from a finite cluster and scheduled against competing tenants; some of the work must run far from the datacenter, on edge and on-device hardware with a fraction of the power; and all of it must keep producing correct answers while components crash, links partition, and adversaries probe. This part teaches the operational layer that turns a rack of hardware into a platform the previous six parts can take for granted, and it does so by naming the three pressures that platform must absorb: contention for shared resources, distance from the data, and the constant possibility of partial failure.
Part Overview
The book has spent six parts distributing the work and almost no time on the machines the work runs on. That was deliberate. The distribution axes (data, training, model, inference, coordination, and intelligence) are easier to reason about when the cluster underneath is treated as an idealized, always-available pool. Part VII removes that idealization. It studies the physical and operational substrate every axis depends on, and the harder question that the idealization hid entirely: how the substrate stays alive when its parts inevitably fail. The transition is from "how do we spread the computation?" to "where does the computation physically live, who decides, and what happens when a piece of it dies?"
Three chapters answer that in order of widening scope. Chapter 33 stays inside the datacenter and asks who gets the machines: gang scheduling for tightly coupled training jobs, bin-packing and topology-aware placement for accelerators, fair sharing and preemption across tenants, and the Kubernetes and Slurm and YARN machinery that arbitrates it all. The pool of workers that every parallel-training chapter assumed is, in practice, a queue this chapter teaches you to win. Chapter 34 then pushes the computation out of the datacenter entirely, down the gradient from cloud to fog to edge to the device in a user's hand, where bandwidth, energy, and memory are scarce and the design pressure inverts: move the model to the data rather than the data to the model. Chapter 35 closes the part, and with it the technical arc of the book, by making reliability and security first-class: replication and checkpointing and graceful degradation so the system survives failure, and threat modeling, Byzantine tolerance, and privacy so it survives adversaries.
Read together, the three chapters form one argument: a distributed AI system is only as scalable as the platform that schedules it, only as reachable as the locations it can run in, and only as useful as the fraction of time it is correct and available. The clever parallelism of Parts III through V and the agent coordination of Part VI all assume an infrastructure that delivers machines, reaches users, and stays up. This part is where that assumption is paid for in engineering, and where the system stops being a diagram of dataflow and becomes a service that runs every day.
Part Roadmap
- Chapter 33 Cluster Infrastructure and Scheduling Who gets the machines: gang scheduling, topology-aware accelerator placement, fair sharing and preemption across tenants, and the Kubernetes, Slurm, and YARN systems that turn a finite cluster into the worker pool every parallel-training chapter assumed.
- Chapter 34 Edge, Fog, and On-Device Distributed AI Pushing computation out of the datacenter down the cloud-to-fog-to-edge-to-device gradient, where bandwidth, energy, and memory are scarce and the rule inverts: move the model to the data instead of the data to the model.
- Chapter 35 Reliable and Secure Distributed AI Keeping the substrate alive: replication, checkpointing, and graceful degradation against failure, plus threat modeling, Byzantine tolerance, and privacy against adversaries, the chapter that makes correctness and availability first-class.
If you keep one idea from this part, keep this: the substrate is not infinite, not local, and not infallible, and every one of those three facts is a design constraint, not a footnote. The first six parts could treat machines as an abstract resource because this part absorbs the cost of making that abstraction true. Scheduling answers the scarcity of the cluster, edge and fog answer the distance to the data, and reliability and security answer the inevitability of failure and attack. Each chapter takes one assumption the book leaned on for hundreds of pages and shows the engineering that has to hold for the assumption to keep paying off in production. A method that is elegant on an idealized pool and unschedulable, unreachable, or fragile on a real one has not actually scaled out; it has only been drawn.
What's Next?
Part VII completes the technical machinery of the book: the axes of distribution, their algorithms, their serving and agent layers, and now the infrastructure that runs them. Part VIII: Case Studies and Capstone Projects spends those tools on real systems. The case studies (web-scale RAG, federated medical AI, distributed recommendation, multi-agent robotics, and distributed agentic applications) each cut across many parts at once, and every one of them rests on the scheduling, edge placement, and reliability engineering this part introduced. The capstone then asks you to design a distributed AI system end to end and defend its choices on exactly the axes and operational metrics the book has built up. Read Part VIII to watch the whole framework operate under the weight of complete, working systems.