"I averaged in every gradient I was handed, exactly as instructed. Nobody told me some of the hands belonged to the adversary."
A Worker Node, Trusting Every Gradient It Receives, Perhaps Too Much
Spreading an AI workload across many machines multiplies not only its capacity but also its attack surface: every additional node, network link, and shared dependency is one more place an adversary can read, corrupt, or deny. A single-machine model lives inside one trust boundary; a distributed or federated system lives across hundreds, some of which it does not own and cannot inspect. The previous sections of this chapter treated faults as accidental, a crashed worker or a slow disk. Security asks the harder question: what if a participant is not merely broken but actively hostile, and is shaping its messages to make you fail? This section builds the threat model that the rest of the chapter operationalizes. It names the assets worth protecting, the attacks that target each one, and the defenses that answer them, setting up data poisoning in Section 35.4 and Byzantine-robust aggregation in Section 35.5 as the two attacks and defenses we study in depth.
Reliability, the subject of the sections before this one, assumes that components fail independently and without intent: a node dies, a link drops packets, a disk fills, and none of these events is trying to deceive you. Security removes that comfort. An adversary chooses which component to compromise, observes how your system reacts, and crafts inputs designed to defeat the very mechanisms you built for reliability. The same averaging step that makes data-parallel training exact in Chapter 1 becomes a liability when one of the values being averaged was chosen by an attacker to drag the result wherever it likes. Moving from reliability to security is the move from "what can break by accident?" to "what can be broken on purpose?", and at scale the answer involves far more components than any honest fleet would ever fail at once.
The scale-out tie is direct and unavoidable. A federated learning round (Chapter 14) aggregates updates from thousands of phones the operator has never seen and cannot audit; a multi-tenant cluster runs jobs from mutually distrustful teams on shared hardware; an edge deployment (Section 34.9) ships model weights to devices that are physically in the hands of users, some of whom are adversaries. In every case the system must produce a correct result while trusting parties it does not control. Security in distributed AI is therefore not an add-on; it is a consequence of the same decision to distribute that the whole book is about.
1. The Attack Surface Grows with the Cluster Beginner
On a single machine, the boundary between "inside the system" and "outside" is a single process on a single host. The data, the model, the gradients, and the predictions all live behind one operating-system boundary, and an attacker must breach that one boundary to touch any of them. Distribution shatters this neat picture. The data now flows across a network to many workers; the gradients travel back to a parameter server or around an all-reduce ring (Chapter 11); the model is sharded across devices that may sit in different racks, datacenters, or organizations; and the predictions are served from a fleet behind a load balancer. Each of these flows is a wire an attacker can tap, and each node holding a piece is a host an attacker can compromise.
Figure 35.3.1 maps this surface onto the distributed training-and-serving pipeline. Read it as the inventory of places where confidentiality, integrity, or availability can be attacked: the data path, the gradient or parameter channel, the compute nodes themselves, the shared infrastructure beneath them, and the inference endpoint exposed to the world. The remaining sections of this section walk these regions in turn.
The growth is not merely additive. If a single host is compromised with probability $p$ over the lifetime of a job, then the probability that at least one of $N$ hosts is compromised is $1 - (1-p)^N$, which approaches one as $N$ grows. The very scale that buys throughput also raises the chance that some participant is hostile, and in the federated and decentralized settings of Chapter 14 a fraction of participants may be adversarial by assumption rather than by accident. We therefore parameterize the threat by the fraction of malicious nodes $f/N$, exactly as the Byzantine fault model of Chapter 2 parameterized arbitrary failures, and ask how large that fraction can grow before the system's guarantees break.
Most distributed-training code is written as if every worker faithfully computes and reports its assigned work. That assumption is reasonable inside one trusted datacenter and false the moment any participant is outside your control: a federated client, a co-tenant on shared hardware, a spot instance reclaimed and resold, a third-party data vendor. Security engineering for distributed AI is the discipline of replacing "assume honest" with "tolerate a bounded fraction $f/N$ of arbitrarily malicious participants", and then proving the system still produces a usable result. Every defense in this chapter is an instance of that replacement.
2. Confidentiality, Integrity, Availability, Mapped to AI Assets Beginner
Classical security organizes threats along three axes, the so-called CIA triad: confidentiality (only authorized parties can read an asset), integrity (only authorized parties can change it), and availability (authorized parties can use it when needed). The triad is generic, but it becomes a precise checklist once we map it onto the four assets a distributed AI system actually holds: the training data, the model weights, the gradients or updates in flight, and the predictions served to clients. Table 35.3.1 carries out that mapping and names the representative attack in each cell, which is the structure the rest of this section follows.
| AI asset | Confidentiality (reading) | Integrity (changing) | Availability (using) |
|---|---|---|---|
| Training data | Leakage of private records; membership inference | Data poisoning, backdoors (35.4) | Withholding or corrupting shards |
| Model weights | Model extraction / stealing | Weight tampering in the registry | Deleting or ransoming the model |
| Gradients / updates | Reconstructing data from gradients | Malicious updates; Byzantine aggregation (35.5) | Dropping or delaying updates |
| Predictions | Output snooping in transit | Adversarial examples (evasion) | Denial of service on the endpoint |
Two cells deserve emphasis because they are distinctive to distributed AI rather than inherited from generic systems security. First, the confidentiality of gradients: a value that looks like an anonymous vector of numbers can, with enough effort, be inverted to reconstruct the private training examples that produced it, which is why secure aggregation and differential privacy (introduced in Chapter 14 and revisited later in this chapter) protect the update channel even when no single update is itself sensitive. Second, the integrity of gradients: because training averages updates from many sources, an attacker who controls even one source can move the average, which is the precise vulnerability that Byzantine-robust aggregation in Section 35.5 exists to close. The runnable demonstration in Section 5 makes this second point concrete.
3. Threat Classes: Training Time, Inference Time, Infrastructure Intermediate
The cells of Table 35.3.1 cluster naturally into three threat classes by when in the system's life they strike. Training-time attacks corrupt the model while it is being built. Inference-time attacks target the finished model as it serves. Infrastructure attacks compromise the machinery underneath both. Keeping the three classes distinct matters because they call for different defenses deployed at different stages, and because a single adversary often chains them: a supply-chain foothold (infrastructure) used to plant a backdoor (training time) that is later triggered by a crafted input (inference time).
Training-time attacks exploit the fact that a learner trusts its data and its peers. In data poisoning, the attacker injects or relabels training examples so the learned model misbehaves; in a backdoor (or trojan) attack, the poison is crafted so the model behaves normally except on inputs carrying a secret trigger, at which point it produces an attacker-chosen output. These are the central subject of Section 35.4. In the distributed setting they are amplified: a poisoner does not need to corrupt a central dataset, it merely needs to be one of the many participants whose updates are aggregated, which is why federated learning makes poisoning both easier to mount and harder to detect. The fraction of poisoned data or malicious participants, again $f/N$, is the attacker's budget, and a defense is meaningful only relative to a stated bound on it.
Inference-time attacks leave the model unchanged and instead exploit the trained model through its public interface. Three matter most. Adversarial examples (evasion) are inputs perturbed by a small, often imperceptible amount $\delta$ with $\lVert \delta \rVert \le \epsilon$ that nonetheless flip the model's prediction, so the attacker controls the output without touching the weights. Model extraction (stealing) queries the endpoint enough times to train a surrogate that replicates the target's behavior, stealing the intellectual property embodied in the weights through the prediction channel alone. Membership inference asks, of a specific record, whether it was in the training set, a confidentiality breach that matters acutely for models trained on medical or personal data (Chapter 14). All three are sharpened by distribution: a model replicated across a public-facing fleet offers the attacker many endpoints and high query throughput.
Infrastructure attacks ignore the AI semantics entirely and go after the systems substrate, where distribution does the most damage because the substrate is shared. A compromised worker can report fabricated gradients or read every shard that passes through it. A man-in-the-middle on the parameter channel (Chapter 11) can read updates in flight (a confidentiality break) or alter them (an integrity break) unless the channel is authenticated and encrypted. A supply-chain compromise, a poisoned dependency, a backdoored container image, or a tampered model in the registry, reaches every node that pulls it, turning one corrupted artifact into a fleet-wide breach. The purple band in Figure 35.3.1 is exactly this layer, and its reach across every box above is why it is the highest-leverage target an adversary has.
Who: A security engineer on a team running a federated next-word prediction model across millions of phones.
Situation: The model improved every night by aggregating on-device updates; no raw text ever left a phone, which the team treated as the end of the privacy story.
Problem: A coordinated set of emulated clients began submitting updates engineered to make the model suggest an offensive phrase after a common trigger word, a model-replacement backdoor delivered entirely through legitimate-looking updates.
Dilemma: Tighten aggregation to reject the malicious updates and risk discarding genuine updates from users with unusual but valid writing styles, or keep simple averaging and ship a poisoned model to millions of devices.
Decision: They replaced plain averaging with a robust aggregator that clips each update's norm and down-weights statistical outliers, accepting a small loss in convergence speed for a bound on any single participant's influence.
How: Per-update norm clipping capped each client's contribution, a coordinate-wise robust mean (the family studied in Section 35.5) absorbed the remaining outliers, and anomaly scoring flagged clusters of suspiciously similar updates for review.
Result: The backdoor's success rate collapsed because no bounded-norm minority could move the aggregate far, and clean accuracy was essentially unchanged.
Lesson: Privacy (no raw data leaves the device) and integrity (the aggregate resists malicious updates) are different guarantees. Federated learning gives the first almost for free and the second not at all; the second must be engineered into the aggregation step.
4. Defenses: From Encrypted Channels to Robust Aggregation Intermediate
The defenses pair with the threat classes, and it is useful to see them as layers rather than as a single switch. At the channel layer, authentication and encryption (mutual TLS between workers and the aggregator, signed model artifacts in the registry) close the man-in-the-middle and supply-chain integrity gaps; they ensure that a message came from who it claims and was not altered, which is the baseline every distributed-training deployment should already have. At the execution layer, trusted execution environments (hardware enclaves such as confidential-computing VMs and GPUs) let a node prove it ran the agreed code on the agreed data, shrinking the trust placed in a co-tenant or a cloud operator. At the data layer, differential privacy bounds how much any single record can influence the model, defeating membership inference and gradient reconstruction by adding calibrated noise, at a measured cost in accuracy.
The layer this chapter studies most closely is the aggregation layer, because it is where the distinctive AI vulnerability lives. When training averages updates from many sources, the mean is the least robust statistic imaginable: a single value sent to $\pm\infty$ drags the average there too. Anomaly detection can flag updates whose norm or direction departs sharply from the consensus, and robust aggregation replaces the mean with an estimator that a bounded minority cannot move, such as a coordinate-wise median, a trimmed mean, or a geometric-median rule. These are the Byzantine-robust aggregators of Section 35.5, and they are the direct security descendant of the fault-tolerant aggregation lineage that runs from the Byzantine model in Chapter 2 through elastic training to here. The demonstration below shows why the mean fails and why a trimmed estimator survives, motivating the full treatment in the next section.
The robust aggregators sketched here are production features, not research code you must reimplement. Flower (flwr) ships robust federated strategies (for example FedTrimmedAvg and a Krum-style selector) as drop-in replacements for plain FedAvg; TensorFlow Federated exposes tff.aggregators with norm-clipping and zeroing aggregators that compose with secure aggregation. A from-scratch robust round, gather updates, clip norms, drop the $f$ largest and smallest per coordinate, average the rest, is a dozen-plus lines per call; the library version is a one-line strategy swap that also handles the secure-aggregation transport and the staleness bookkeeping:
# Plain federated averaging is one strategy ...
from flwr.server.strategy import FedAvg, FedTrimmedAvg
# ... and a Byzantine-robust trimmed mean is a drop-in replacement.
strategy = FedTrimmedAvg(beta=0.1) # trim 10% of updates per coordinate
# server.fit(strategy=strategy) # secure transport + robustness, handled internally
FedAvg for a robust strategy is a one-line change in Flower; the library handles the clipping, trimming, and secure-aggregation transport that a hand-rolled defense would have to assemble itself.5. Why the Mean Is Not Safe: A Single Update Captures the Average Intermediate
The clearest way to feel the integrity problem is to watch one malicious update capture the aggregate. The script below has $N = 32$ nodes report a gradient. Twenty-nine are honest and report noisy copies of the same true gradient whose first coordinate points clearly positive; three are malicious and send a large update aimed in the opposite direction. We then compute the plain average over all $N$ nodes, the answer we actually want (the honest mean), and a trimmed mean that drops the $f$ largest and smallest values per coordinate before averaging. Finally we sweep the number of attackers to find how few it takes to flip the sign of the aggregated gradient's first coordinate.
import numpy as np
rng = np.random.default_rng(7)
N, d = 32, 20 # N nodes report a gradient of dimension d
f = 3 # f of them are malicious
# Honest nodes report noisy copies of the same true gradient whose first
# coordinate points clearly in the positive direction.
g_true = rng.standard_normal(d); g_true[0] = 1.0
honest = g_true + 0.15 * rng.standard_normal((N - f, d))
# Malicious nodes send a large, coordinated update aimed the opposite way.
attack_dir = np.zeros(d); attack_dir[0] = 1.0
malicious = -40.0 * attack_dir + 0.15 * rng.standard_normal((f, d))
reports = np.vstack([honest, malicious])
honest_mean = honest.mean(axis=0) # the answer we WANT
naive_mean = reports.mean(axis=0) # plain average over all N
# Trimmed mean: drop the f largest and f smallest per coordinate, then average.
srt = np.sort(reports, axis=0)
trimmed = srt[f:N - f].mean(axis=0)
dist = lambda a, b: float(np.linalg.norm(a - b))
print(f"nodes N : {N} (malicious f = {f}, fraction = {f/N:.3f})")
print(f"||naive_mean - honest|| : {dist(naive_mean, honest_mean):.3f}")
print(f"||trimmed - honest|| : {dist(trimmed, honest_mean):.3f}")
print(f"naive_mean[0] : {naive_mean[0]:+.3f} (honest target {honest_mean[0]:+.3f})")
print(f"trimmed[0] : {trimmed[0]:+.3f}")
print()
for k in range(0, 9): # how few attackers flip the sign of mean[0]?
mix = np.vstack([honest, np.tile(-40.0 * attack_dir, (k, 1))])
m = mix.mean(axis=0)[0]
flag = " <-- mean[0] now points the wrong way" if m < 0 else ""
print(f" f={k}: naive mean[0] = {m:+.3f}{flag}")
nodes N : 32 (malicious f = 3, fraction = 0.094)
||naive_mean - honest|| : 3.865
||trimmed - honest|| : 0.108
naive_mean[0] : -2.867 (honest target +0.981)
trimmed[0] : +0.960
f=0: naive mean[0] = +0.981
f=1: naive mean[0] = -0.385 <-- mean[0] now points the wrong way
f=2: naive mean[0] = -1.663 <-- mean[0] now points the wrong way
f=3: naive mean[0] = -2.861 <-- mean[0] now points the wrong way
f=4: naive mean[0] = -3.986 <-- mean[0] now points the wrong way
f=5: naive mean[0] = -5.045 <-- mean[0] now points the wrong way
f=6: naive mean[0] = -6.044 <-- mean[0] now points the wrong way
f=7: naive mean[0] = -6.987 <-- mean[0] now points the wrong way
f=8: naive mean[0] = -7.879 <-- mean[0] now points the wrong way
The number worth dwelling on is that one attacker out of thirty-two flips the sign of the gradient that the whole training step will follow. The honest fleet wanted to move the parameter in one direction; a single adversary sending an unbounded update made it move in the other. The trimmed mean resists because it discards the extremes before averaging, so a bounded minority cannot reach the surviving samples, and it returns within rounding distance of the honest answer. This is the entire argument for Byzantine-robust aggregation, made on twenty lines of code: when you average values you do not control, the mean is an attack vector, and the fix is to replace it with an estimator whose breakdown point exceeds the adversary's budget $f/N$. Section 35.5 turns this observation into named algorithms with provable guarantees.
The combine step that Chapter 1 celebrated as exact, summing one vector per worker and broadcasting the result, is the same step an adversary attacks here. Scale-out made averaging the heart of distributed training; security observes that the heart is undefended, because a sum trusts every term equally. The book's signature primitive does not disappear under an adversary; it acquires a robustness requirement. Every robust aggregator in Section 35.5 is an all-reduce that has learned not to trust all of its inputs, the natural continuation of the fault-tolerance arc that began with the Byzantine model in Chapter 2.
Three threads are especially active. First, robust aggregation under realistic heterogeneity: classical rules such as Krum and coordinate-wise median assume nearly identical honest updates, which fails when clients hold non-IID data, so recent work (Bucket-based robust aggregation and the FLTrust lineage) couples robustness with personalization and a small trusted root dataset. Second, stealthy and adaptive poisoning: model-replacement and "a little is enough" attacks craft malicious updates that stay inside the norm bounds defenses check, prompting certified-robustness work that proves a bound on attacker influence rather than testing against known attacks. Third, the security of large models specifically: training-data extraction from production LLMs, prompt-injection as an inference-time integrity attack on agentic systems (Chapter 32), and watermarking weights to detect extraction are all 2024-to-2026 fronts where the attack surface of distribution meets the scale of foundation models. The unifying lesson is that defenses must state and certify the adversary budget $f/N$ they tolerate, not merely survive yesterday's attack.
The unnerving thing about the attacker in Output 35.3.2 is how cooperative it looks. It shows up on time, sends a correctly shaped vector, and participates in every round. It breaks no protocol and trips no liveness alarm; it simply lies about the value. A crashed node at least has the decency to go quiet. A Byzantine node keeps smiling and shaking hands while it steers your gradient off a cliff, which is precisely why reliability mechanisms tuned for silence never see it coming.
Take a concrete distributed AI system you know (a federated recommender on phones, a multi-tenant training cluster, or a public image-classification API). For each of the four assets in Table 35.3.1 (training data, model weights, gradients, predictions), name one realistic confidentiality, integrity, and availability attack against that specific system, and state which trust boundary in Figure 35.3.1 the attacker must cross to mount it. Identify which single cell you judge highest-risk for your system and justify the ranking.
Extend Code 35.3.2. Hold the honest nodes fixed and increase the malicious fraction $f/N$ from $0$ to $0.5$, replacing the trimmed mean's trim count so it always drops $f$ from each end. Plot the distance from the honest mean for both the plain mean and the trimmed mean as $f/N$ grows. Identify empirically the fraction at which the trimmed mean's error begins to blow up, and relate it to the well-known result that coordinate-wise robust estimators tolerate up to (but not including) half the values being adversarial. Then make the attack adaptive: instead of a fixed $-40$ direction, have the attackers place their values just inside the largest honest value per coordinate, and explain why this weakens the trimmed-mean defense.
Consider the gradient channel between workers and a parameter server (Chapter 11). Suppose updates travel in plaintext over a shared datacenter network. Enumerate the confidentiality and integrity attacks this enables, then argue which the following defenses do and do not stop, taken one at a time: (a) mutual TLS on the channel, (b) differential privacy on each update, (c) robust aggregation at the server. Show that no single defense covers all of confidentiality, integrity, and the gradient-reconstruction risk, and propose a minimal combination that does, stating the cost each layer adds.