Part VIII: Case Studies and Capstone Projects

"For forty chapters they taught me one axis at a time. Then they wired all six together, pointed at the latency graph, and said: now make it pay for itself."
An Architecture Diagram That Finally Has Every Box Filled In

Big Picture

A real distributed AI system is never one axis of distribution; it is all six at once, contending for the same budget, and the engineering is the act of balancing them against a single objective. The seven prior parts each isolated one move: distribute the data, parallelize the training, shard the model, serve the inference, coordinate the agents, run it on real infrastructure. Part VIII puts the moves back together. Five worked case studies trace complete systems from requirement to deployment, and in each one you watch the six axes interact, compete for bandwidth and dollars, and resolve into a design that actually ships. The closing capstone chapter turns the lens around: it hands you the same design-space checklist that opened the book and asks you to defend a system of your own. By the end of this part you will not just recognize the axes; you will have seen them assembled five times and assembled them once yourself.

Part Overview

Every part before this one was, by design, a partial view. Part II taught you to distribute the data and left the model on one box. Part IV sharded the model and assumed the data pipeline was solved. Part V served the inference and treated training as upstream. Part VI coordinated the agents and Part VII placed the whole thing on real clusters and edge devices. Each part was honest about its narrowing: it isolated one axis so the mechanism could be taught from scratch. The cost of that clarity is that no single earlier chapter ever showed you a complete system, the kind you are actually asked to build, in which all six axes are live at the same time and every design choice on one axis pushes back on the others. Part VIII pays that debt. It assembles the book.

The five case studies are chosen to span the design space rather than to repeat it. Chapter 36 builds a web-scale text pipeline feeding a distributed retrieval-augmented generation service, exercising the data, inference, and retrieval axes together. Chapter 37 moves to the decentralized end of the spectrum, where data never leaves the hospital and the coordination axis carries privacy and regulation on its back. Chapter 38 returns to the centralized, throughput-bound world of recommendation, where the embedding tables outgrow any one machine and the training and serving axes meet under a strict latency budget. Chapter 39 pushes the intelligence axis to the edge of physics, coordinating robots and drone swarms under real-time and partial-observability constraints. Chapter 40 assembles the most contemporary system of all, a distributed large-language-model and agentic application in which serving, retrieval, orchestration, and tool use share one fleet.

The final chapter is not a case study but a method. Chapter 41 is the capstone-design playbook: it walks the reader from a vague ambition to a defensible distributed-AI architecture, using the design-space checklist of Chapter 1 as the spine and drawing components from every part of the book. It asks the questions the five case studies answered implicitly and makes you answer them on the record: which resource ceiling binds, which axis relieves it, what the communication and failure taxes cost, and whether the system still pays for itself once those taxes are paid. The case studies show the destination; the capstone hands you the map and the pen.

Read together, the six chapters close the arc the book opened. Chapter 1 promised that any AI system could be read as a point in a six-axis design space and that the parts would teach the axes one at a time. Part VIII redeems both halves of that promise: it shows the axes recombined in systems you recognize, and it asks you to do the recombining yourself. Nothing new is introduced here that the earlier parts did not earn; what is new is the act of assembly, the discipline of holding all six axes in mind at once while a single objective, and a single budget, decides among them.

The Insight That Ties the Part Together

If you keep one idea from Part VIII, keep this: the difficulty of a real distributed AI system lives not on any single axis but in the contention between them. Each earlier part could optimize its axis in isolation and declare victory. A shipped system cannot, because the bandwidth that data parallelism wants is the bandwidth model parallelism also wants, the freshness recommendation demands fights the cost the serving fleet must contain, and the autonomy a drone swarm needs trades against the coordination that keeps it safe. Every case study in this part is a record of that contention resolved, and the resolution is never a single best axis but a balance struck against one objective under one budget. That balancing act, not any individual primitive, is the engineering this book has been teaching. The capstone in Chapter 41 asks you to perform it.

Part Roadmap

36 Web-Scale Text Processing and Distributed RAG A complete pipeline from crawling and cleaning a web-scale corpus to a sharded vector index and a distributed retrieval-augmented generation service, exercising the data, retrieval, and inference axes as one system.
37 Federated Medical AI A decentralized system where patient data never leaves the hospital, and the coordination axis must carry privacy, heterogeneity, and regulation while still training one shared model.
38 Distributed Recommendation at Scale A throughput-bound recommender whose embedding tables outgrow any single machine, where distributed training and low-latency serving must meet under a strict freshness-and-cost budget.
39 Multi-Agent Robotics and Drone Swarms The intelligence axis pushed to the edge of physics: many embodied agents coordinating under real-time deadlines, partial observability, and intermittent communication.
40 Distributed LLM and Agentic Applications The most contemporary assembly in the book: a distributed large-language-model and agentic application in which serving, retrieval, tool use, and orchestration contend for one shared fleet.
41 Capstone Project Design Not a case study but a method: the playbook that carries the reader from a vague ambition to a defensible distributed-AI architecture, using the design-space checklist of Chapter 1 as its spine.

Read the five case studies in order and the capstone last. The studies move deliberately across the design space, from the centralized, data-heavy world of Chapter 36 to the decentralized, privacy-bound world of Chapter 37, through the latency-bound recommender of Chapter 38 and the physically embodied swarm of Chapter 39, to the fleet-shared agentic system of Chapter 40. By the time you reach Chapter 41 you will have watched the six axes recombined five times under five different binding constraints, which is exactly the repertoire the capstone asks you to draw on when you design a system of your own.

Looking Back, and What Comes After the Book

Part VIII closes the eight-part arc that Chapter 1 opened. The six axes named there have now been distributed one at a time across seven parts and reassembled five times here, and the design-space checklist that ended Chapter 1 returns as the working tool of the capstone. What remains is reference, not narrative. The Appendices are where you will go once you start building: Appendix A is the mathematical-background refresher the proofs leaned on, Appendix B is the companion cluster lab that turns the book's primitives into runnable exercises, Appendix C is the notation and glossary that fixes every symbol and term used across the chapters, and Appendix D catalogs the datasets and benchmarks the case studies drew from. Read the case studies for the destination, build the capstone for the practice, and keep the appendices open for the work that follows the last page.