Part VII: Cluster, Edge, and Reliable Infrastructure
Chapter 34: Edge, Fog, and On-Device Distributed AI

Edge, Fog, and On-Device Distributed AI

Scale-out leaves the datacenter: distributed intelligence across millions of weak, heterogeneous, intermittently connected nodes at the periphery.

Conceptual illustration for Chapter 34: Edge, Fog, and On-Device Distributed AI

"They moved me out of the warm datacenter and bolted me to a lamppost. My neighbors are a traffic camera and a parking meter, my uplink drops out every afternoon, and somehow I am still expected to know a pedestrian when I see one."

An Inference Model Deployed to the Curb
Big Picture

Every prior chapter assumed a cluster you own: contiguous, fast-networked, plugged into the wall. This chapter takes the same scale-out instinct and pushes it outward to the periphery, where the machines are phones, cameras, vehicles, and gateways, the network is intermittent, the power is metered, and no single scheduler owns anything. The datacenter of Chapter 33 concentrated compute so a coordinator could place a tightly coupled gang of workers one hop apart. The edge does the opposite: it scatters compute across millions of heterogeneous, resource-constrained, intermittently connected nodes that sit where the data is born and the action must be taken. Distribution here is not a choice made for throughput; it is forced by physics, by privacy, by latency budgets a round trip to the cloud cannot meet. This chapter teaches you to reason about the cloud-fog-edge-device continuum as one design surface, deciding what computes where, what crosses the network, and what never leaves the device at all.

Chapter Overview

The first six parts of this book, and Chapter 33 with them, lived inside a perimeter. The fabric was fast, the nodes were homogeneous and well powered, and a control plane owned a contiguous pile of machines. This chapter steps over that perimeter. Distributed AI at the edge runs on devices the system does not own and cannot count on: a phone whose owner closes the app mid-inference, a roadside camera on a flaky cellular link, a drone whose battery sets a hard ceiling on every model it can carry. The unifying picture is a continuum that runs from the centralized cloud through regional fog nodes to edge gateways and finally to the device in your hand, and the central act of this chapter is placement along that continuum: deciding which layer runs which part of the computation, what data crosses each hop, and where a result must be produced fast enough to matter. The pulls that organize the chapter, latency, bandwidth, energy, and privacy, are exactly the pulls a datacenter could ignore and the edge cannot.

The chapter builds that continuum from the outside in. Section 34.1 frames edge AI as the deliberate distribution of intelligence to the periphery and names the constraints, latency, bandwidth, energy, and privacy, that make the move necessary rather than optional. Section 34.2 introduces fog computing, the regional tier between cloud and device that absorbs work too heavy for a sensor and too latency-sensitive for the cloud. Section 34.3 turns to the device itself, where on-device inference must fit a model into a few hundred milliwatts and a few megabytes, leaning on the per-node efficiency and quantization of Chapter 22. Section 34.4 joins device and cloud into one pipeline through edge-cloud collaboration and split computing, partitioning a single network so an early stage runs locally and a later stage runs upstream.

The middle of the chapter widens from one device to many. Section 34.5 treats distributed sensing, where readings from many imperfect nodes are fused into an estimate no single sensor could produce. Section 34.6 brings the federated and decentralized learning of Chapter 14 to the edge, training a shared model across devices that keep their data local and report only updates. The final stretch confronts the demands the edge makes sharpest. Section 34.7 takes on latency-critical distributed AI, where a result late is a result wrong and the tail-latency arithmetic of a serving fleet becomes a hard real-time deadline. Section 34.8 grounds the whole continuum in robotics and autonomous systems, where perception, planning, and control are split across an onboard computer, a fleet backend, and sometimes a teleoperator. Section 34.9 closes on privacy-preserving edge AI, the discipline of keeping raw data on the device and letting only protected summaries leave it.

A word on why this matters even if every system you build lives in a datacenter. The edge inverts the assumptions the rest of the book rests on, and that inversion is clarifying. When the network is fast and free you can be careless about what crosses it; when each byte costs energy and each round trip blows a deadline, you are forced to decide what computation is worth its communication, which is the central question of distributed AI stated in its harshest form. The placement decisions here are the same packing-and-topology decisions of Chapter 33, but with the topology stretched across a continent and the scheduler dissolved into millions of independent agents. Learn to reason about the periphery and the datacenter reads as the easy case.

Prerequisites

This chapter draws several threads of the book out to the periphery, so it assumes the chapters that laid those threads down. From Chapter 14 it assumes federated and decentralized learning, FedAvg, gossip, and secure aggregation, which return in Section 34.6 as the way a fleet of devices trains a shared model without surrendering its data. From Chapter 22 it assumes per-node inference efficiency, quantization, pruning, and the KV-cache economics that decide whether a model fits on a device at all, the foundation the on-device inference of Section 34.3 stands on. From Chapter 5 it assumes the metrics, latency percentiles, throughput, and cost, against which every edge placement is judged, which the latency-critical material of Section 34.7 sharpens into deadlines. From Chapter 9 it assumes stream processing and online AI, the continuous-arrival model that distributed sensing in Section 34.5 and real-time perception inherit. Readers comfortable with those four threads can read this chapter as the place where the book finally meets the unreliable, resource-starved world outside the rack.

Learning Objectives

The One Idea to Carry Out of This Chapter

If you keep one thing from this chapter, keep this: at the edge, distribution is not chosen for throughput, it is forced by physics, and every placement decision is a negotiation among latency, bandwidth, energy, and privacy that a datacenter never had to hold. A pedestrian detector must answer before the car arrives, so it cannot wait on a cloud round trip; a phone keyboard must learn from your typing without shipping your keystrokes anywhere; a sensor net must turn a thousand noisy readings into one trustworthy estimate on a battery budget. Each of these is the same continuum question asked under a different binding constraint: what computes on the device, what computes in the fog, what computes in the cloud, and what crosses the network between them. Read forward, the chapter is a tour of the layers (fog, device, split) and the cross-device disciplines (sensing, federated learning, real-time, robotics, privacy) that answer that question. Read as a question, it is a single checklist you apply to any peripheral system: which constraint binds here, which layer should hold this work, and is the result fast, cheap, and private enough to be worth its communication? The roadmap below walks the nine sections that build that checklist.

Chapter Roadmap

Read the nine sections in order and you will have a working map of distributed AI beyond the rack: Section 34.1 names the continuum, Sections 34.2 through 34.4 place computation along its layers, and Sections 34.5 through 34.9 name the cross-device disciplines, sensing, federated learning, real-time deadlines, robotics, and privacy, that make a fleet of weak nodes behave as one system. The thread to watch runs back to Chapter 14: the federated averaging and secure aggregation introduced there as an abstraction return in Section 34.6 and Section 34.9 as the concrete machinery of learning on real, unreliable, privacy-bound devices, which is why federated edge learning is the hinge of the chapter.

What's Next?

This chapter scattered AI across the periphery, where nodes are weak, links are flaky, and some of them may be in the hands of an adversary. That last possibility is the opening of the next chapter. Chapter 35: Reliable and Secure Distributed AI turns from where AI runs to how it survives, taking the faults this chapter took for granted, a dropped device, a poisoned update, a Byzantine worker that lies, and treating them as the central problem rather than a nuisance. The secure aggregation and differential privacy of Section 34.9 were a first taste; the next chapter develops Byzantine-robust aggregation, poisoning and backdoor defenses, and the fault-tolerance arithmetic that decides whether a distributed system keeps its promises when parts of it break or betray it. We have built the map of where intelligence lives; now we ask how it stays correct, available, and trustworthy when the world stops cooperating.

Bibliography & Further Reading

Foundational Papers

Satyanarayanan, M. "The Emergence of Edge Computing." IEEE Computer 50(1), 2017. ieeexplore.ieee.org

The canonical statement of why computation moves to the periphery: latency, bandwidth, and privacy, the three forces that frame the continuum of Section 34.1.

📄 Paper

Bonomi, F., Milito, R., Zhu, J., Addepalli, S. "Fog Computing and Its Role in the Internet of Things." MCC Workshop on Mobile Cloud Computing (SIGCOMM), 2012. dl.acm.org

The paper that named fog computing and the regional tier between cloud and device; the conceptual origin of Section 34.2.

📄 Paper

McMahan, H. B., Moore, E., Ramage, D., et al. "Communication-Efficient Learning of Deep Networks from Decentralized Data." AISTATS 2017. arXiv:1602.05629

The FedAvg paper that founded federated learning, training a shared model across devices that keep their data local; the engine of Section 34.6.

📄 Paper

Bonawitz, K., Ivanov, V., Kreuter, B., et al. "Practical Secure Aggregation for Privacy-Preserving Machine Learning." ACM CCS 2017. eprint.iacr.org

The protocol that lets a server sum device updates without seeing any one of them; the cryptographic backbone of privacy-preserving edge learning in Sections 34.6 and 34.9.

📄 Paper

Kang, Y., Hauswald, J., Gao, C., et al. "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge." ASPLOS 2017. dl.acm.org

The split-computing paper that finds the layer-by-layer partition point minimizing latency and energy; the method Section 34.4 develops for edge-cloud collaboration.

📄 Paper

Dean, J., Barroso, L. A. "The Tail at Scale." Communications of the ACM 56(2), 2013. dl.acm.org

The classic on why tail latency dominates at scale and how to fight it; the arithmetic Section 34.7 hardens from an average into a real-time deadline.

📄 Paper

Dwork, C., Roth, A. "The Algorithmic Foundations of Differential Privacy." Foundations and Trends in Theoretical Computer Science, 2014. cis.upenn.edu

The standard monograph defining differential privacy and its composition rules; the formal guarantee behind the protected summaries that leave a device in Section 34.9.

📄 Paper

Tools & Libraries

Howard, A. G., Zhu, M., Chen, B., et al. "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications." 2017. arXiv:1704.04861

The depthwise-separable architecture that made vision models fit on a phone; the canonical example of a model designed for the on-device budget of Section 34.3.

📄 Paper

Jacob, B., Kligys, S., Chen, B., et al. "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference." CVPR 2018. arXiv:1712.05877

The integer-only quantization scheme that powers on-device inference on commodity hardware; the practical realization of Chapter 22's quantization at the edge in Section 34.3.

📄 Paper

TensorFlow Lite: on-device machine learning. ai.google.dev/edge

The production runtime for deploying quantized models to phones, microcontrollers, and edge accelerators; the toolkit behind the on-device inference of Section 34.3.

🔧 Tool

TensorFlow Federated: machine learning on decentralized data. tensorflow.org/federated

The open framework for expressing and simulating federated computations; the practical entry point for the federated edge learning of Section 34.6.

🔧 Tool

Flower: a friendly federated learning framework. flower.ai/docs

A framework-agnostic toolkit for running federated learning across real heterogeneous devices; a concrete platform for the cross-device training of Section 34.6.

🔧 Tool

Systems & Documentation

ROS 2 Documentation: the Robot Operating System. docs.ros.org

The de facto middleware for distributed robotics, splitting perception, planning, and control across processes and machines; the substrate of the robotic systems in Section 34.8.

📖 Docs

DDS: the Data Distribution Service (Object Management Group). omg.org/spec/DDS

The real-time publish-subscribe standard underneath ROS 2 that carries sensor and control traffic on a deadline; the transport that makes latency-critical robotics of Sections 34.7 and 34.8 possible.

📖 Docs

NVIDIA Jetson: the edge AI and robotics platform. developer.nvidia.com

The accelerated embedded compute behind much on-device perception and autonomy; a concrete instance of the device tier whose power and memory ceilings shape Sections 34.3 and 34.8.

📖 Docs