Section 39.10: Project Extension | Building Scalable AI

"I learned to fly one drone in a simulator. They added eleven more, took away the map, and told me there was no longer anyone in charge. I have never been happier."
A Simulated Swarm, Practicing for a Field It Has Never Flown

Big Picture

This chapter taught a drone swarm as a stack of decentralized mechanisms; this final section hands it back to you as a buildable project that grows a single-agent navigation baseline into the full swarm one mechanism at a time, with a measurable milestone at every step. You will start where every robotics project starts, one agent finding its way across a simulated arena on a machine that knows everything. Then you will add, in the order the chapter introduced them, the parts that turn one agent into a swarm with no center: consensus-based coordination (Section 39.2), decentralized task allocation (Section 39.3), a range-limited communication model (Section 39.4), shared-map fusion (Section 39.5), decentralized control with collision avoidance (Section 39.6), a centralized-training decentralized-execution (CTDE) reinforcement-learning policy trained across parallel simulators (Section 39.7), domain randomization for the simulation-to-reality gap (Section 39.8), and injected failures and Byzantine agents to test safety (Section 39.9). The project is a distributed embodied intelligence in miniature: no coordinator, every decision local, the whole thing scaling with the swarm rather than with any one machine. It is the book's thesis made to fly.

A swarm read passively teaches the shape of decentralized control; a swarm rebuilt teaches its cost. The nine sections before this one walked the swarm as a finished system, naming the decentralized mechanism behind each capability and the chapters of Part VI and Part VII that own it. This section inverts that posture. It gives you a staged construction plan in which you begin with a baseline small enough to run on one laptop and honest enough to measure, a single agent navigating a known arena, then remove one centralized assumption per stage, measuring what the decentralization bought you before moving on. Each stage draws on a specific earlier chapter, so the project doubles as a guided tour back through the multi-agent half of the book: when you add consensus you are applying the agreement protocols of Chapter 29, when you train the joint policy you are applying the MARL machinery of Chapter 30 over the actor-learner infrastructure of Chapter 20, and when you harden against Byzantine agents you are applying the robust aggregation of Chapter 35. The discipline that makes the project worth its time is the one Section 1.1 opened with: decentralize a capability only when a ceiling forces it, and prove with a number that the decentralization held.

Figure 39.10.1: The staged build. The baseline at the top navigates one agent across a known arena on one machine; each numbered phase below removes one centralized assumption, drawing on the chapter that owns that mechanism, and attaches a measurable milestone (alignment and coverage, collision rate, task completion, sim-to-real gap, robustness to $k$ failures). Carried to the end, the five phases turn one agent into a decentralized swarm whose intelligence lives in no single machine, the distributed-intelligence axis of Section 1.2 made embodied.

1. The Baseline You Scale Out From Beginner

Every honest swarm project begins with a single-agent baseline, for two reasons. The first is correctness: a swarm is only worth building if it does something one agent cannot, and you cannot show that against a baseline you never built. The second is measurement: every swarm metric, from coverage time to collision rate, is read against the single-agent reference. Your baseline is one agent in a simulated arena that knows everything: it has the full map, a single goal, and a planner that drives it from start to finish in one process. The arena is small enough to run on a laptop and structured enough that coordination will visibly matter once you add agents; a bounded 2D world with a few obstacles and a coverage objective is the right size. This baseline is centralized in every way the swarm will eventually reject, which is exactly why it is the right place to start: each later phase is defined by precisely which centralized assumption it deletes.

The first assumption to delete is the central coordinator itself. The code below is a swarm-in-miniature: a dozen agents, each with only a heading and a position, that reach a common heading using decentralized consensus over range-limited neighbors alone, never a broadcast and never a leader. It is the Vicsek alignment model, the simplest honest stand-in for the consensus of Section 39.2 and the flocking of Chapter 31. Each agent averages the heading vectors of the agents within its communication radius and adopts the result; the swarm-level alignment is read by the Vicsek order parameter, which runs from $0$ for chaos to $1$ for a perfectly aligned flock. That an aligned heading emerges from purely local averaging, with no agent ever seeing the whole swarm, is the decentralization invariant your project must preserve at every later stage.

import math, random

random.seed(7)
N, R, STEPS = 12, 0.45, 60          # agents, comm radius, rounds
# Each agent: position in the unit square and a heading angle. No center.
pos = [(random.random(), random.random()) for _ in range(N)]
ang = [random.uniform(-math.pi, math.pi) for _ in range(N)]

def neighbors(i):                   # range-limited comms: only agents within R
    out = []
    for j in range(N):
        dx, dy = pos[i][0]-pos[j][0], pos[i][1]-pos[j][1]
        if dx*dx + dy*dy <= R*R:    # j includes i itself
            out.append(j)
    return out

def order():                        # Vicsek order parameter: 0=chaos, 1=aligned
    sx = sum(math.cos(a) for a in ang)
    sy = sum(math.sin(a) for a in ang)
    return math.hypot(sx, sy) / N

print("round  order  mean_degree")
print(f"{0:5d}  {order():.3f}")
for t in range(1, STEPS+1):
    new, deg = [], 0
    for i in range(N):
        nb = neighbors(i)
        deg += len(nb)
        # decentralized consensus: average the heading vectors of neighbors only
        cx = sum(math.cos(ang[j]) for j in nb) / len(nb)
        cy = sum(math.sin(ang[j]) for j in nb) / len(nb)
        new.append(math.atan2(cy, cx))
    ang = new
    # move a small step along the new heading (keeps the swarm mobile)
    pos = [((pos[i][0] + 0.02*math.cos(ang[i])) % 1.0,
            (pos[i][1] + 0.02*math.sin(ang[i])) % 1.0) for i in range(N)]
    if t in (1, 5, 15, 30, 60):
        print(f"{t:5d}  {order():.3f}  {deg/N:.1f}")

print(f"\nfinal order parameter : {order():.3f}")
print(f"neighbor msgs/round   : ~{sum(len(neighbors(i)) for i in range(N))} (range-limited)")
print(f"all-to-all msgs/round : {N*(N-1)} (centralized broadcast)")

Code 39.10.1: The swarm-in-miniature first scale-out move. Twelve mobile agents reach a common heading by averaging only the headings of neighbors within radius $R$, with no coordinator; the Vicsek order parameter measures how aligned the flock becomes, and the message counts contrast neighbor-only with all-to-all communication.

round  order  mean_degree
    0  0.280
    1  0.661  4.8
    5  0.718  4.7
   15  0.904  3.5
   30  1.000  7.3
   60  1.000  6.0

final order parameter : 1.000
neighbor msgs/round   : ~56 (range-limited)
all-to-all msgs/round : 132 (centralized broadcast)

Output 39.10.1: The swarm converges from near-chaos (order $0.280$) to a perfectly aligned heading (order $1.000$) in about thirty rounds, using only local averaging over an average of five to seven neighbors per agent. Neighbor-only communication moves roughly $56$ messages per round against $132$ for an all-to-all broadcast, the saving that lets the scheme scale with the swarm.

Key Insight: Build the Centralized Baseline First, Then Delete One Assumption Per Phase

The temptation is to start with the full decentralized swarm, because decentralization is the interesting part. Resist it. Without a single-agent, fully-centralized baseline you cannot say what coordination bought, cannot detect when a decentralized policy quietly does worse than one omniscient agent, and cannot tell whether the swarm is correct or merely lively. The order parameter climbing to $1.000$ in Output 39.10.1 is exactly the kind of check a baseline makes possible: it proves alignment emerged from local rules, not from a coordinator you forgot to remove. Each phase of this project is defined by the single centralized assumption it deletes, and the milestone for that phase is the number that confirms the deletion did not break the mission.

2. Staging the Swarm, Milestone by Milestone Intermediate

With the baseline in hand, you grow it into a swarm one mechanism at a time, in the order of Figure 39.10.1, never advancing until the current phase hits its milestone. The discipline of one-mechanism-at-a-time matters because it isolates cause and effect: when coverage time drops or the collision rate spikes, exactly one thing changed, and you know which section's mechanism to blame. Table 39.10.1 is the project plan. Each row names the mechanism, the centralized assumption it deletes, the section or chapter that supplies it, and the measurable milestone that tells you the phase is done.

Table 39.10.1: The staged build plan. Add mechanisms top to bottom; do not advance until the milestone is met. Each row deletes one centralized assumption and draws its mechanism from the named section or chapter.

Mechanism to add	Centralized assumption deleted	Source (section / chapter)	Milestone to hit
1. Consensus coordination	A leader sets the shared variable	Section 39.2, Ch 29	Order parameter $\ge 0.95$ from local rules
2. Decentralized task allocation	A scheduler assigns every task	Section 39.3, Ch 29	All tasks claimed, no double-assignment
3. Range-limited comms	Every agent hears every agent	Section 39.4, Ch 34	Coverage time held as range $R$ shrinks
4. Shared-map fusion	One global map is given	Section 39.5	Fused map error within tolerance of truth
5. Decentralized control + avoidance	A central planner routes all agents	Section 39.6, Ch 31	Collision rate $\to 0$ at target density
6. CTDE MARL policy	Hand-tuned local rules	Section 39.7, Ch 30, Ch 20	Task completion beats the scripted swarm
7. Domain randomization	One fixed simulator equals reality	Section 39.8, Ch 34	Sim-to-real performance gap below threshold
8. Failure / Byzantine injection	Every agent is alive and honest	Section 39.9, Ch 35	Mission completes despite $k$ of $n$ failures

The milestones are quantitative on purpose. "Added consensus" is not a milestone; "the order parameter reaches $0.95$ using only neighbor averaging" is. Phases 1 through 4 are coordination-and-perception phases, judged by alignment, allocation correctness, coverage time under shrinking communication range, and map-fusion accuracy. Phase 5 is a safety phase, where decentralized reciprocal collision avoidance (Chapter 31) must drive the collision rate to zero even as the agents pack closer together. Phase 6 swaps the hand-tuned rules for a learned CTDE policy trained across parallel simulators (Chapter 30) on the actor-learner infrastructure of Chapter 20, and the milestone forbids shipping a learned policy that does not beat the scripted swarm it replaces. Phase 7 adds domain randomization (Section 39.8) so the policy survives the move from one simulator to the messy field, the on-device deployment concern of Chapter 34. Phase 8 closes the loop by injecting crashes and Byzantine agents and demanding, with the robust-aggregation tools of Chapter 35, that the mission completes anyway.

Practical Example: The Capstone That Grew One Mechanism at a Time

Who: A robotics graduate student building this exact project as a term capstone on a single workstation with one GPU plus a free tier of cloud CPU for parallel simulators.

Situation: The single-agent baseline navigated a known arena in simulation in seconds and reached its goal reliably.

Problem: The assignment required a decentralized swarm that survived field-like failures, but a first attempt that wired up all eight mechanisms at once produced a swarm that thrashed, collided, and could not be debugged.

Dilemma: Build the whole decentralized swarm in one leap, fast to write but a black box the moment it misbehaved, or add one mechanism at a time against milestones, slower to start but debuggable throughout.

Decision: The staged plan of Table 39.10.1. The student added consensus first, confirmed the order parameter passed $0.95$, then added allocation, then narrowed the communication range, refusing to touch collision avoidance until coverage held.

How: The CTDE policy trained across sixty-four parallel CPU simulators reaching the actor-learner pattern of Chapter 20; domain randomization over mass, drag, and sensor noise closed most of the sim-to-real gap; a median-based robust merge held the formation when two of twelve agents were made Byzantine.

Result: The final swarm covered the arena faster than twelve uncoordinated agents, kept a zero collision rate at the target density, and completed the mission with two agents failed, every claim backed by a number measured against the baseline.

Lesson: Add one mechanism, hit its milestone, and only then add the next. A swarm built in one leap is a swarm you cannot trust; a swarm built in stages is a swarm whose every number you can defend.

3. The Numbers Your Swarm Must Hit Intermediate

A swarm project lives or dies by whether its milestones are measured, not felt, so each phase targets a number you compute in advance. Three families of metric carry the whole project. Coordination quality is the Vicsek order parameter from Code 39.10.1, $\phi = \frac{1}{N}\left\lVert \sum_{i=1}^{N} (\cos\theta_i,\, \sin\theta_i) \right\rVert$, which the consensus phase must drive above $0.95$. Coverage efficiency is the time for $n$ agents to sweep an arena of area $A$ with sensing radius $r$; the geometric floor is $T_n \ge A / (n \cdot v \cdot 2r)$ for speed $v$, so the coverage milestone asks the swarm to come within a constant factor of that floor and to fall roughly as $1/n$ as agents are added. Safety is the collision rate $C = (\text{collision events}) / (\text{agent-steps})$, which the avoidance phase must drive to zero at the target packing density. Robustness is the largest $k$ such that the mission still completes with $k$ of $n$ agents failed or Byzantine; classical robust aggregation tolerates up to $k < n/2$ crash failures and $k < n/3$ Byzantine ones, the bound from Chapter 35 your phase-8 milestone inherits.

Two further numbers govern whether the design scales at all. The first is message complexity, the lever that separates a swarm that grows from one that chokes. With range-limited communication each agent talks only to its neighbors, so per-round traffic is $O(n \bar{d})$ for average degree $\bar{d}$; centralized broadcast is $O(n^2)$. Output 39.10.1 makes the gap concrete at $n = 12$: about $56$ neighbor messages against $132$ for all-to-all, and the gap widens with every agent you add, which is why the swarm scales with $n$ while the broadcast does not. The second is simulation throughput, the budget for the MARL phase. Training a CTDE policy needs environment frames, and the actor-learner architecture of Chapter 20 produces them in parallel: with $p$ simulator workers each stepping $f$ frames per second the aggregate is $\Phi = p \cdot f$ frames per second, so a target of $10^8$ training frames at $f = 2{,}000$ frames per second per worker needs about $\lceil 10^8 / (T \cdot 2{,}000) \rceil$ workers to finish in time $T$; at $p = 64$ workers that is roughly a $13$-minute run, the sim-throughput argument from Section 39.7 made arithmetic. Compute these targets before you build, so each milestone is a prediction you test rather than a result you rationalize.

Library Shortcut: Each Phase Is a Few Lines in a Multi-Agent Tool

The hand-rolled consensus in Code 39.10.1 is for understanding; in the real project each phase maps to a framework that handles the multi-agent plumbing for you. PettingZoo gives the standard multi-agent environment API, and RLlib trains the CTDE policy across parallel simulators with the actor-learner infrastructure of Chapter 20 underneath. Code 39.10.2 names that mapping, turning the staged plan into a near-deployment plan:

# Each staged mechanism -> the multi-agent tool that owns it.
STACK = {
    "consensus / flocking": "numpy + a comms graph   # local averaging, Vicsek",
    "task allocation":      "auction / CBBA helper    # decentralized claims",
    "range-limited comms":  "PettingZoo ParallelEnv   # per-agent local obs only",
    "map fusion":           "numpy / g2o pose graph    # distributed SLAM merge",
    "collision avoidance":  "RVO2 / ORCA              # reciprocal velocity obstacles",
    "CTDE MARL policy":     "RLlib MADDPG / MAPPO     # central critic, local actors",
    "domain randomization": "gymnasium wrappers        # randomize mass, drag, noise",
    "Byzantine robustness": "median / Krum aggregator  # Chapter 35 robust merge",
}
for phase, tool in STACK.items():
    print(f"{phase:22s}-> {tool}")

Code 39.10.2: The eight staged mechanisms mapped to the multi-agent tool that owns each. A finished Table 39.10.1 row selects a key, and the value is the framework the corresponding section teaches; the from-scratch consensus of Code 39.10.1 collapses to a PettingZoo environment and an RLlib MAPPO trainer.

4. Extension Challenges Worth the Swarm Advanced

Once the eight phases hit their milestones you have a working decentralized swarm, and the project becomes a platform for the harder questions the chapter only gestured at. Each extension below adds one capability that a real field swarm needs, and each reaches into a different part of the book, so finishing them turns the capstone from a flock into a system. Scale the swarm: push $n$ from a dozen to hundreds and watch which mechanism breaks first, then confirm that range-limited communication keeps per-round traffic at $O(n\bar{d})$ rather than $O(n^2)$, the scaling claim of Section 3 turned into an experiment. Build a heterogeneous team: mix fast scouts with slow heavy-lift agents and extend the decentralized allocation of Section 39.3 so tasks flow to the agent best suited to them, measuring whether a mixed team beats a uniform one on the same mission. Add adversarial agents that actively work against the swarm rather than merely failing, and harden the consensus and allocation against them with the robust aggregation of Chapter 35, reporting the largest adversarial fraction the mission survives.

The final extension is the one that makes the project real: port the policy to hardware. Take the domain-randomized CTDE policy from phase 7 and run it on a small fleet of physical robots or a high-fidelity simulator standing in for them, the on-device deployment problem of Chapter 34, where each agent must run its share of the policy under a real compute and energy budget with no cloud to lean on. The sim-to-real gap you measured in phase 7 becomes the headline number: how much performance survives the crossing from simulator to field, and which slice of domain randomization mattered most. Each extension is a small, bounded change to a working swarm, which is exactly the posture in which decentralized-systems concepts are learned best: against a baseline you can measure, in a swarm you already understand.

Research Frontier: Where Decentralized Swarms Are Heading (2024 to 2026)

The extensions above track live research lines, so a capstone that implements them is working at the current edge. Scalable MARL has moved toward graph-attention and mean-field policies that keep learning stable as $n$ grows into the hundreds, with the GigaStep and related massively-parallel swarm simulators pushing training to billions of frames on a single accelerator. Sim-to-real for aerial swarms has matured from single-drone transfer to whole formations, with neural-swarm controllers that learn the aerodynamic downwash between close-flying drones and transfer to physical fleets. Decentralized perception is converging on collaborative and distributed SLAM where agents fuse local maps over a bandwidth-limited graph rather than a shared server, the phase-4 problem at field scale. On the safety side, Byzantine-robust and resilient consensus for mobile multi-robot teams is an active line, tightening the $k < n/3$ bound under motion and intermittent connectivity, and learned-versus-certified safe control is being fused so a swarm can be both adaptive and provably collision-free. Treating the swarm as a learning, sensing, and reasoning collective rather than a fixed controller is, as of 2026, the most active frontier in multi-robot systems, and it is the bridge from this chapter into the agentic systems of Chapter 40.

5. Chapter Summary and What You Built Beginner

This section closes Chapter 39, so it is worth stating the through-line the whole chapter built. We began with the problem definition (Section 39.1): coordinate a swarm of mobile, communication-limited robots toward a shared mission with no central coordinator, no global map, and no guarantee that every agent stays alive and honest. From there the chapter walked the swarm capability by capability, and every capability was the same move applied to a different part of the system, decide locally, exchange only what reaches a neighbor, and let the collective behavior emerge. Consensus coordination (Section 39.2) replaced a leader with local agreement, applying the protocols of Chapter 29. Decentralized task allocation (Section 39.3) replaced a scheduler with agents claiming their own work. The range-limited comm model (Section 39.4) and shared-map fusion (Section 39.5) replaced omniscience with neighbor-only sensing and a map stitched from local pieces. Decentralized control with collision avoidance (Section 39.6) replaced a central planner with reciprocal local maneuvers from the swarm intelligence of Chapter 31. The CTDE MARL policy (Section 39.7) replaced hand-tuned rules with a learned joint policy trained across parallel simulators on the infrastructure of Chapter 20 and the MARL methods of Chapter 30. Domain randomization (Section 39.8) replaced a single simulator with a distribution of worlds so the policy survived the field. And failure injection (Section 39.9) proved the swarm holds when agents crash or turn Byzantine, applying the robust aggregation of Chapter 35. The chapter is, end to end, one embodied distributed intelligence spread across a mobile, communication-limited swarm with no center.

Thesis Thread: Distributed Intelligence That Moves

The book's spine is that AI at scale is the engineering of systems whose data, computation, models, inference, and decisions are distributed across many machines, and that each distribution is forced by a ceiling, not chosen for elegance. A drone swarm is the most literal demonstration of that thesis in the book, because the machines are not in a rack: they are scattered across the field, they cannot all hear one another, and any of them may vanish mid-mission. There is no center to fall back on, so distribution is not an optimization here but a precondition, and the intelligence genuinely lives in no single machine. The staged project in this section is that thesis made buildable and made to fly: you do not read about decentralized embodied intelligence, you assemble it, one deleted centralized assumption at a time, and watch a single navigating agent become the swarm it was always going to be. The same move, decide locally and recombine, that synchronized gradients in Section 1.1 now flies a formation with no one in charge.

Key Takeaway: Chapter 39 as a Buildable Swarm

A drone swarm is not nine unrelated tricks; it is one decentralized embodied intelligence in which every capability is the same decide-locally-and-recombine move applied to a different part of the system. (1) Consensus replaces the leader, so a shared heading or estimate emerges from local averaging. (2) Decentralized allocation replaces the scheduler, so agents claim their own tasks. (3) Range-limited comms replace omniscience, so traffic stays $O(n\bar{d})$ rather than $O(n^2)$. (4) Map fusion replaces the global map, so perception is stitched from neighbors. (5) Collision avoidance replaces the central planner, so safety is reciprocal and local. (6) A CTDE MARL policy replaces hand-tuned rules, trained across parallel simulators. (7) Domain randomization replaces one simulator, so the policy crosses to the field. (8) Failure injection replaces the assumption that every agent is alive and honest, so the mission survives $k$ of $n$ losses. Built in this order against milestones, the swarm is a distributed intelligence with no center that scales with the swarm, which is why it is the book's thesis made to fly.

Project Ideas: Build the Swarm, Then Push It

Each idea is sized so that carrying it through the staged plan of Table 39.10.1 becomes a capstone in the sense of Chapter 41. Core build: start from the Code 39.10.1 swarm-in-miniature and grow it through all eight phases in simulation, recording the milestone at each step (order parameter, coverage time, allocation correctness, map error, collision rate, task completion, sim-to-real gap, robustness to $k$ failures); the deliverable is a writeup in which every number is measured against the single-agent baseline. Scale $n$: push the swarm from a dozen to hundreds of agents, find the mechanism that breaks first, and confirm range-limited comms keep per-round traffic at $O(n\bar{d})$. Heterogeneous team: mix scouts and heavy-lift agents and extend the decentralized allocation of Section 39.3 so tasks flow to the best-suited agent, reporting the lift over a uniform team. Adversarial agents: add agents that actively sabotage consensus and allocation, harden with the robust aggregation of Chapter 35, and report the largest adversarial fraction the mission survives. Real-hardware port: deploy the domain-randomized CTDE policy onto a physical fleet or high-fidelity stand-in under a real edge compute budget (Chapter 34), and make the surviving fraction of performance the headline sim-to-real number.

Exercise 39.10.1: Name the Assumption, Phase by Phase Conceptual

For each of the eight phases in Table 39.10.1, state the single centralized assumption it deletes and the section or chapter that supplies the replacement mechanism. Then identify the one phase whose milestone is a safety target rather than a speed or coordination target, and explain why decentralizing that phase can make the swarm dangerous, not merely slower, if you push the agent density too high. Finally, argue which phase you would add first on a real swarm and why profiling the mission, not intuition, should decide that order.

Exercise 39.10.2: Extend the Swarm-in-Miniature Coding

Starting from Code 39.10.1, (a) sweep the communication radius $R$ from $0.15$ to $0.6$ and plot the final order parameter against $R$, identifying the rough threshold below which the swarm fails to align because the neighbor graph disconnects. (b) Add three agents that ignore their neighbors and hold a fixed random heading (a crash-style failure), and measure how the final order parameter degrades as a function of how many such agents you add. (c) Replace the plain neighbor average with a coordinate-wise median of the neighbor headings, inject one Byzantine agent that broadcasts a wildly wrong heading, and show that the median merge keeps the order parameter high where the mean collapses. Relate your result to the $k < n/3$ Byzantine bound from Section 3.

Exercise 39.10.3: Size the Swarm and Bound the Coverage Analysis

A target mission asks $n$ agents to cover an arena of area $A = 10{,}000$ square meters, each with sensing radius $r = 5$ meters and speed $v = 2$ meters per second. (a) Using the geometric floor $T_n \ge A / (n \cdot v \cdot 2r)$ from Section 3, compute the best-case coverage time for $n = 4$, $n = 16$, and $n = 64$ agents, and state how coverage time scales with $n$. (b) Compare per-round message complexity for $n = 64$ under range-limited comms with average degree $\bar{d} = 6$ against all-to-all broadcast, and state the ratio. (c) Suppose phase 6 needs $5 \times 10^7$ training frames and each simulator worker runs at $f = 2{,}000$ frames per second; compute how many workers $p$ you need to finish training in ten minutes, and explain which chapter's actor-learner infrastructure supplies those parallel simulators.