Part VIII: Case Studies and Capstone Projects
Chapter 39: Multi-Agent Robotics and Drone Swarms

Decentralized Control

"I never knew what the formation looked like. I only knew I was a little too far from the two robots beside me, so I closed the gap. Apparently that was the whole plan."

A Controller, Acting on What It Can See, Trusting the Rest to Do the Same
Big Picture

Decentralized control is the field-time counterpart of coordination: where coordination (Section 39.2) decides what the team should achieve, control is the loop that runs on each robot at hundreds of hertz, computing one motor command from what that robot can sense right now. The defining constraint is locality. No robot holds the global state, no robot waits on a central optimizer, and yet the team must close into a formation, avoid collisions, and stay stable. This section shows how a purely local feedback law, in which each agent pulls toward agreement with its neighbors, produces coherent global behavior; how each agent can run its own model-predictive optimizer that merely respects its neighbors; and how local safety filters keep the swarm collision-free. It is the embodied, real-time form of the book's thesis that global intelligence needs no central coordinator.

By the time a drone swarm is in the air, the planning is over. Section 39.5 fused the distributed sensors into a shared belief about where everything is, and Section 39.2 turned the mission into a coordinated assignment of goals across the team. What remains is the part that cannot be delegated to a server and cannot tolerate a network round trip: the control loop. Each robot, tens or hundreds of times per second, must convert its current estimate of the world into a velocity or a thrust. A central controller that gathered every robot's state, solved one big optimization, and broadcast every command would be elegant on paper and useless in the air, because the latency, the bandwidth, and the single point of failure all scale the wrong way. So control is pushed to the edge, onto each robot, where it becomes a local computation over local information. The remarkable fact, and the subject of this section, is that local control laws can be designed so that the global behavior they produce is exactly the one the coordinator asked for. Figure 39.6.1 sketches the loop a single robot runs.

One robot's local control loop (every neighbor runs the identical loop) Sense neighbor positions Disagreement xᵢ − xⱼ vs desired Control law uᵢ = −Σ(xᵢ−xⱼ) Safety filter ORCA / CBF Actuate motors world changes; loop repeats at hundreds of hertz Neighbor robots (also sensing me)
Figure 39.6.1: The decentralized control loop that runs on each robot. The agent senses only its neighbors, measures how far the realized geometry is from the desired one, applies a local control law to get a command, passes that command through a safety filter (Section 5), and actuates. The dashed couplings show that each robot is simultaneously a neighbor in everyone else's loop; no box in this picture has global state, and there is no central controller anywhere in the diagram.

1. Decentralized Versus Distributed Control Beginner

The two words are often used interchangeably, and the distinction is worth fixing because it determines how much a robot is allowed to know. A decentralized controller computes each robot's action from that robot's own local information only, with no communication beyond what its sensors pick up; the law on robot $i$ is a function of $i$'s state and the states of robots it can directly observe. A distributed controller relaxes this slightly: robots may exchange messages with their neighbors over a communication graph, so robot $i$'s action can depend on quantities its neighbors compute and send, but still never on a global view. Both stand in contrast to centralized control, where one node ingests every robot's state and emits every command. This is the same axis the book drew in Section 1.4 between centralized and decentralized architectures, now realized in the tightest possible loop: a controller that must close in milliseconds.

The reason field robotics lives at the decentralized and distributed end is not ideology; it is physics and failure. A central controller adds a round trip to every control step, and a control loop that must run at 200 hertz cannot afford a 50-millisecond network detour. It also concentrates all risk in one node: lose the coordinator and the whole swarm goes dark, exactly the single point of failure that Chapter 2 warned against. Pushing the loop onto each robot removes both problems at once, at the price of a harder design question, which is the question this section answers: how do you write a rule that sees only a fragment of the world yet drives the whole team to the right global behavior?

Key Insight: The Coordinator Sets the Reference, the Controller Realizes It

Coordination and control are not competitors; they are two layers of the same stack running at two time scales. Coordination (Section 39.2) is slow and semantic: it decides which robot covers which sector, what shape the formation should be, who leads. It produces a reference: a desired position, a desired inter-agent offset, a desired velocity. Control is fast and geometric: at every tick it drives the robot toward that reference using only what it can see. When you read a decentralized control law, ask what reference it is tracking and where that reference came from; the answer is almost always the coordination layer above it.

2. Local Feedback Control With Neighbor Coupling Intermediate

The cleanest decentralized control law is the consensus protocol, and it is worth deriving because every richer controller in this section is a variation on it. Model each robot $i$ as a simple integrator whose state $x_i \in \mathbb{R}^2$ is its position and whose control input $u_i$ is its commanded velocity, so $\dot{x}_i = u_i$. Let $\mathcal{N}(i)$ be the set of neighbors robot $i$ can sense. The consensus control law tells each robot to move so as to reduce its disagreement with each neighbor,

$$u_i = -\sum_{j \in \mathcal{N}(i)} \bigl(x_i - x_j\bigr).$$

Read it literally: robot $i$ looks at each neighbor $j$, measures the vector from $j$ to itself, sums those vectors, and moves in the opposite direction. A robot that is too far from its neighbors is pulled inward; a robot in the middle of its neighbors feels no net pull and holds still. Nothing in this law mentions a global coordinate, a central command, or any robot outside $\mathcal{N}(i)$. Stack all the states into one vector and the whole swarm obeys $\dot{x} = -L x$, where $L$ is the graph Laplacian of the communication graph, the matrix that Chapter 13 used for graph machine learning, now governing motion instead of message passing.

To turn agreement into a useful shape, give each robot a desired offset $\delta_{ij}$ to each neighbor, a small local quantity it can be told once at deployment, and have it drive the relative position to that offset rather than to zero:

$$u_i = -\sum_{j \in \mathcal{N}(i)} \Bigl(\bigl(x_i - x_j\bigr) - \delta_{ij}\Bigr).$$

Now the equilibrium is not a single point but the formation whose pairwise offsets match the $\delta_{ij}$. This is the law the demo in Section 6 runs, and it is the workhorse of decentralized formation control: a pentagon, a line, a grid, all expressed purely as neighbor-relative targets that no robot ever assembles into a global picture.

Thesis Thread: Global Behavior Without a Global View

The consensus law is the most literal possible statement of this book's thesis. Each robot runs $u_i = -\sum_{j}(x_i - x_j)$ on a fragment of the state, no robot ever holds the whole vector $x$, and yet the team provably converges to a single coherent configuration. The all-reduce of Chapter 1 combined partial gradients into one exact global gradient; consensus control combines partial sensing into one exact global formation. Same thesis, two domains: the coherent whole is computed by purely local operations, and the central coordinator is not missing, it was never needed.

3. Stability: When Local Rules Guarantee Global Behavior Advanced

A local rule is only trustworthy if you can prove it produces the intended global behavior, and for the consensus law the proof is clean enough to state. The dynamics $\dot{x} = -L x$ are governed by the eigenvalues of the graph Laplacian $L$. For an undirected, connected communication graph, $L$ is symmetric positive semidefinite with eigenvalues

$$0 = \lambda_1 < \lambda_2 \le \cdots \le \lambda_n.$$

The single zero eigenvalue corresponds to the agreement direction (all robots equal), and the crucial condition is that the second eigenvalue, the algebraic connectivity $\lambda_2$, is strictly positive. Connectivity of the graph is exactly the condition $\lambda_2 > 0$, and when it holds, every disagreement mode decays at a rate set by its eigenvalue, so the swarm converges to consensus exponentially with rate governed by $\lambda_2$. The takeaway is operational: a denser, better-connected communication graph has a larger $\lambda_2$ and therefore a faster, stiffer formation, while a graph on the edge of disconnection has $\lambda_2 \to 0$ and a formation that drifts together sluggishly. Stability is not assumed; it is read off the spectrum of a matrix that the topology alone determines.

This is the guarantee that makes decentralized control engineering rather than hope. You do not test a formation controller by flying a hundred drones and watching; you check that the communication graph stays connected, which bounds $\lambda_2$ away from zero, which bounds the convergence rate, which certifies that local agreement becomes global formation. Section 39.7 will replace this hand-designed law with a learned one, and the price of that power is precisely this guarantee: a neural controller has no Laplacian whose eigenvalues you can inspect, so its stability must be established by other, harder means.

4. Distributed Model-Predictive Control Advanced

The consensus law is a one-step reaction; it does not look ahead, and it knows nothing about a robot's actuation limits or the obstacles in front of it. Model-predictive control (MPC) fixes this by having each robot, at every tick, solve a short-horizon optimization: predict its own motion over the next few hundred milliseconds, choose the control sequence that best tracks its reference while respecting its dynamics and limits, apply only the first step, and re-solve at the next tick. In distributed MPC, each robot solves a local MPC over its own trajectory only, treating its neighbors' predicted trajectories as known boundary conditions that they broadcast. Robot $i$ minimizes

$$\min_{u_i(\cdot)} \; \sum_{t=0}^{H-1} \Bigl\lVert x_i(t) - r_i(t) \Bigr\rVert^2 + \rho \sum_{j \in \mathcal{N}(i)} \Bigl\lVert \bigl(x_i(t) - x_j(t)\bigr) - \delta_{ij} \Bigr\rVert^2$$

subject to its own dynamics and input constraints, where $r_i$ is the reference from the coordinator, the second term couples the robot to its neighbors' predicted trajectories $x_j(t)$, and $\rho$ weights formation-keeping against reference tracking. Each robot solves a small problem over only its own variables; the coupling enters solely through the neighbor trajectories it receives, so the global optimization is decomposed into $n$ local ones that run in parallel on $n$ robots. Robots typically exchange predicted trajectories and re-solve once or a few times per tick so the neighbor predictions each robot assumes are consistent with what its neighbors actually do.

Practical Example: The Warehouse Drones That Stopped Phoning Home

Who: A robotics engineer at a logistics company running a swarm of inventory-scanning drones inside a large distribution center.

Situation: Eighteen drones flew lawnmower patterns over the racks; a central server gathered every pose and emitted every velocity command over Wi-Fi.

Problem: Under Wi-Fi contention the command round trip jittered between 20 and 120 milliseconds, and the drones oscillated, occasionally clipping a shelf, because their controllers were starved of fresh commands.

Dilemma: Buy a dedicated low-latency radio and keep the central controller, or move the control loop onto the drones and let the server send only the slow reference (which sector each drone owns).

Decision: They moved control onto the drones, running a local MPC at 100 hertz that tracked the server's sector reference and a consensus term over the two nearest drones for spacing.

How: The server's role shrank to publishing a sector assignment a few times per second; each drone solved its own 0.5-second-horizon MPC on-board and broadcast its predicted trajectory to neighbors.

Result: The oscillation vanished because the control loop no longer crossed the network, throughput rose as the server stopped being a bottleneck, and a server reboot mid-shift no longer grounded the swarm.

Lesson: Keep the fast loop local and the network for the slow reference. Decentralizing control removed both the latency and the single point of failure in one change.

Library Shortcut: A Local MPC Step in a Few Lines With cvxpy

The per-robot MPC above is a small convex quadratic program, and you do not hand-roll the solver. With cvxpy, one robot's local problem, dynamics, input limits, neighbor coupling and all, is a dozen lines, and the library handles the QP formulation and the underlying solver call:

import cvxpy as cp
import numpy as np

H = 10                                  # horizon steps
x = cp.Variable((2, H + 1))             # this robot's predicted positions
u = cp.Variable((2, H))                 # this robot's velocity commands
cost, cons = 0, [x[:, 0] == x0]         # x0 is THIS robot's current state
for t in range(H):
    cost += cp.sum_squares(x[:, t] - ref[:, t])              # track the reference
    for xj, dij in neighbors:                                # only sensed neighbors
        cost += rho * cp.sum_squares((x[:, t] - xj[:, t]) - dij)  # formation coupling
    cons += [x[:, t + 1] == x[:, t] + dt * u[:, t]]          # local dynamics
    cons += [cp.norm(u[:, t], "inf") <= v_max]               # this robot's speed limit
cp.Problem(cp.Minimize(cost), cons).solve()
command = u.value[:, 0]                  # apply only the first step, then re-solve
Code 39.6.1: One robot's local distributed-MPC step in cvxpy. Every quantity is local to this robot or sensed from a neighbor; there is no global state variable and no central solve. Each of the $n$ robots runs this same small program in parallel, the only coupling being the neighbor trajectories xj they broadcast.

5. Collision Avoidance and Safety Constraints Advanced

Formation laws pull robots together; safety is what keeps them from colliding while they do it. The decentralized standard is the reciprocal velocity obstacle, refined as Optimal Reciprocal Collision Avoidance (ORCA). The idea is geometric and local. For a pair of robots $i$ and $j$, the set of relative velocities that would lead to a collision within a time horizon $\tau$ forms a cone, the velocity obstacle; ORCA carves out of velocity space a half-plane of allowed velocities for robot $i$, expressed as

$$\bigl(v_i - v_i^{\text{opt}}\bigr) \cdot n_{ij} \;\ge\; b_{ij},$$

where $n_{ij}$ is the half-plane normal and $b_{ij}$ its offset, both computed from the relative position and velocity of the pair. The word reciprocal is the crucial part: each robot takes responsibility for exactly half of the avoidance, so two robots on a head-on course each veer by half the needed amount and pass cleanly, with no negotiation and no shared controller. Robot $i$ simply intersects the half-planes from all its neighbors and picks the allowed velocity closest to the one its formation law wanted. Figure 39.6.2 shows the construction for a single pair.

Velocity space for robot i x-velocity y-velocity collision cone (velocity obstacle) allowed half-plane nᵢⱼ vᵢ opt (wanted) vᵢ (chosen)
Figure 39.6.2: ORCA in the velocity space of robot $i$. The shaded cone is the set of relative velocities that would cause a collision within the horizon. ORCA replaces the cone with a single linear constraint, the allowed half-plane with normal $n_{ij}$; the robot keeps the formation-preferred velocity $v_i^{\text{opt}}$ if it is allowed, otherwise projects it onto the half-plane boundary to get the chosen $v_i$. Each robot does this independently for every neighbor, which is why no negotiation is needed.

A more general framing comes from control barrier functions (CBFs), which encode a safety set (for example, stay at least a radius $d$ from every neighbor) as a function $h(x) \ge 0$ and certify that the control keeps $h$ from going negative. At a high level, a CBF turns each safety requirement into one linear inequality on the control input, and the robot solves a tiny quadratic program at every tick: track the formation command as closely as possible subject to every safety inequality. The structure is identical to ORCA's, a preferred command projected onto a feasible region of safe commands, and it composes naturally with the distributed MPC of Section 4, where the safety inequalities simply become additional constraints in each robot's local optimization. Either way, safety is enforced locally, on each robot, with no global collision checker in the loop.

Research Frontier: Learned and Certified Decentralized Safety (2024 to 2026)

The current frontier fuses the guarantees of CBFs with the expressiveness of learning. A vigorous line learns decentralized control barrier functions and safe controllers jointly with graph neural networks, so that a single network, evaluated locally on each robot using only neighbor observations, both tracks the formation and certifies collision avoidance for swarms of dozens to hundreds of agents; the GCBF and GCBF+ work of Zhang and collaborators (2023 to 2024) is representative, scaling certified safety far beyond what hand-built barriers reached. A parallel thread couples ORCA-style avoidance with multi-agent reinforcement learning so the avoidance policy is learned rather than constructed, and another pushes neural-network verification toward the runtime guarantees that classical $\lambda_2$ analysis gives the consensus law. The open problem is the one Section 3 named: recovering, for a learned decentralized controller, a stability and safety certificate as crisp as the Laplacian spectrum gives the linear law. We meet the learning side of this story next, in Section 39.7.

6. A Decentralized Formation Controller, Running Intermediate

The code below puts Sections 2 and 3 together on five ground robots arranged on a ring communication graph: each robot senses only its two ring neighbors and runs the offset consensus law $u_i = -\sum_{j \in \mathcal{N}(i)} ((x_i - x_j) - \delta_{ij})$ with the $\delta_{ij}$ taken from a target pentagon. No robot ever assembles the global state; each integrates its own command independently. The script reports the team disagreement (the mean neighbor-relative error) before and after, and checks that the realized shape matches the pentagon up to a global translation, because consensus fixes the shape, not the absolute location.

import numpy as np

rng = np.random.default_rng(7)
N = 5

# Ring adjacency: each robot talks only to its two ring neighbors.
A = np.zeros((N, N))
for i in range(N):
    A[i, (i - 1) % N] = 1.0
    A[i, (i + 1) % N] = 1.0

# Desired formation: a regular pentagon, stored as neighbor-relative offsets.
angles = 2.0 * np.pi * np.arange(N) / N
formation = np.stack([np.cos(angles), np.sin(angles)], axis=1)

x = rng.standard_normal((N, 2)) * 3.0        # scattered initial positions
dt, gain, steps = 0.05, 1.0, 400

def disagreement(x):
    err = 0.0
    for i in range(N):
        for j in range(N):
            if A[i, j] > 0:
                want = formation[i] - formation[j]    # desired offset delta_ij
                have = x[i] - x[j]                     # realized offset
                err += np.sum((have - want) ** 2)
    return np.sqrt(err / np.sum(A))

print(f"robots N             : {N}")
print(f"initial disagreement : {disagreement(x):.4f}")

for t in range(steps):
    u = np.zeros((N, 2))
    for i in range(N):                                # each robot, independently
        for j in range(N):
            if A[i, j] > 0:                           # only its sensed neighbors
                want = formation[i] - formation[j]
                u[i] -= gain * ((x[i] - x[j]) - want) # local consensus control law
    x = x + dt * u                                    # robot i integrates its own action

print(f"final disagreement   : {disagreement(x):.4f}")
centered = x - x.mean(axis=0)
target = formation - formation.mean(axis=0)
shape_error = np.sqrt(np.mean(np.sum((centered - target) ** 2, axis=1)))
print(f"shape error vs target: {shape_error:.4e}")
print(f"converged to formation: {shape_error < 1e-2}")
Code 39.6.2: Decentralized formation control for five robots on a ring graph. The doubly nested loop is purely pedagogical bookkeeping; physically, each robot $i$ runs only its own inner computation over its own neighbors, with no access to the full array x.
robots N             : 5
initial disagreement : 5.1376
final disagreement   : 0.0000
shape error vs target: 8.0793e-13
converged to formation: True
Output 39.6.2: From a scattered start with disagreement above five, the purely local law drives the team to a disagreement of zero and a shape error of $8 \times 10^{-13}$, floating-point exactness. Five robots, each blind to all but two neighbors, jointly realized the global pentagon, the embodied counterpart of the exact distributed gradient in Output 1.1.1.

The convergence to floating-point zero is not luck; it is the eigenvalue guarantee of Section 3 made visible. The ring on five nodes is connected, so $\lambda_2 > 0$, so every disagreement mode decays and the formation error vanishes exponentially. Had we cut one edge and disconnected the ring into a path with an isolated node, $\lambda_2$ would still be positive (a path is connected), but had we removed enough edges to split the graph, $\lambda_2$ would drop to zero and the two pieces would settle into different shapes, the formation broken precisely because the spectral condition failed. The demo is small, but the mechanism is exactly the one that flies a hundred-drone show.

7. Where Decentralized Control Sits in the Robotics Stack Beginner

It is worth closing by placing this loop in the larger picture the chapter has built. Section 39.5 produced the shared belief; decentralized control consumes it, because every $x_i$ and $x_j$ a robot plugs into its law is an estimate from that belief, not ground truth, which is why robust control laws are designed to tolerate the residual uncertainty the fusion leaves behind. Section 39.2 produced the coordination; decentralized control realizes it, turning a coordinated assignment of goals into the moment-to-moment motion that achieves them. And the loop itself is the real-time control loop of Section 34.8, now multiplied across a team: the same hard latency budget, the same on-device computation, the same intolerance for a network detour, but coupled to neighbors so that many such loops cohere into one swarm.

This is the field-time payoff of the whole book's argument. The coordinator that Chapter 2 showed we could not centralize without creating a bottleneck and a single point of failure is, at control time, simply absent: global formation, safe motion, and stable behavior are all produced by controllers that each see a fragment of the world and trust the rest of the team to do the same. The next section keeps the architecture and changes the controller, replacing the hand-designed consensus and MPC laws with policies learned through multi-agent reinforcement learning, the natural next step once you accept that the control loop is decentralized and ask what the best local policy actually is.

Exercise 39.6.1: Decentralized, Distributed, or Centralized? Conceptual

For each controller, classify it as centralized, distributed, or decentralized using the definitions in Section 1, and justify by naming exactly what information each robot's action depends on: (a) every robot sends its pose to a ground station that solves one optimization and broadcasts all commands; (b) each robot runs $u_i = -\sum_{j \in \mathcal{N}(i)}(x_i - x_j)$ using only what its own camera sees; (c) each robot runs a local MPC but first exchanges predicted trajectories with its neighbors over a radio link. Then state, for each, what happens to the team if one robot's radio fails, and why the failure modes differ.

Exercise 39.6.2: Connectivity Sets the Speed Coding

Extend Code 39.6.2 to record the disagreement at every step, then run it on three communication graphs over the five robots: the full ring (each robot sees two neighbors), a single path (cut one ring edge), and a denser graph where each robot also sees its second-nearest neighbor on each side. For each graph, compute the second-smallest Laplacian eigenvalue $\lambda_2$ with numpy.linalg.eigvalsh and overlay the three disagreement-versus-step curves. Confirm empirically that a larger $\lambda_2$ yields faster convergence, and explain the one graph (if any) for which the formation fails to form.

Exercise 39.6.3: The Cost of the Reference Round Trip Analysis

Suppose the control loop must run at 200 hertz (one command every 5 milliseconds) and the coordination reference is updated at 5 hertz (every 200 milliseconds). A centralized design routes every control step through a server with a 30-millisecond round-trip latency; a decentralized design runs control on-board and routes only the reference through the same server. For each design, state whether the loop's deadline is met and quantify the worst-case staleness of the information each robot acts on. Argue from these two numbers why control must be decentralized even when coordination can safely remain centralized, and connect your answer to the latency argument in Section 34.8.