Part VI: Distributed AI and Multi-Agent Systems
Chapter 27: Distributed Artificial Intelligence

Centralized, Decentralized, and Hybrid AI

"They put me in charge of everyone's plans. For a glorious afternoon I was optimal. Then I crashed, and twelve agents discovered they had never learned to ask each other anything."

A Coordinator That Was a Single Point of Failure
Big Picture

The same centralized-versus-decentralized spectrum that organized how we distribute gradients now organizes how we distribute reasoning: a multi-agent system can put one coordinator agent in charge of planning for everyone, let peers negotiate with only local information and no central authority, or stack the two into a hierarchy of local coordinators under a global one. In Section 1.4 the axis ran across parameter servers, all-reduce, and gossip averaging, where the thing being combined was a vector. Here the thing being combined is a decision: who does which task, what plan to follow, what to believe. The trade-offs are the very same ones, a centralized planner is simple and can be globally optimal but is a bottleneck and a single point of failure, while a decentralized swarm is resilient and scalable but pays for it in coordination cost and weaker global guarantees. This section makes that mapping precise and then measures it on a task-allocation problem you can run yourself.

Distributed problem solving, the subject of Section 27.2, asked how a group of agents can jointly solve a problem none of them holds entirely. It left open the question of authority: when the agents must agree on who does what, who decides? That question has the same three answers it had for distributed training, because it is the same question one level up. In data-parallel training the unit of coordination is a gradient and the authority structure decides how partial gradients are combined. In a multi-agent system the unit of coordination is a decision and the authority structure decides how partial intentions are combined. Recognizing that these are the same spectrum, viewed at the data level and at the reasoning level, lets us reuse everything Chapter 2 taught about consensus, bottlenecks, and failure.

Centralized one planner assigns work C simple, globally optimal, bottleneck + single failure point Decentralized peers negotiate, local info only resilient, scalable, more messages, weaker global view Hybrid / hierarchical local coordinators under a global one G global coherence + local autonomy, the common practical middle
Figure 27.3.1: The control-architecture spectrum for multi-agent intelligence. Left, a centralized coordinator $C$ plans and pushes assignments to worker agents (arrows show the flow of authority). Middle, decentralized peers exchange proposals over their local links with no center. Right, a hybrid hierarchy places a global coordinator $G$ over local coordinators that each manage a small team. The same spectrum governed how partial gradients were combined in Section 1.4; here it governs how partial decisions are combined.

1. Centralized: One Agent Plans for Everyone Beginner

In a centralized architecture a single coordinator agent holds, or gathers, the information needed to plan, decides the allocation of work, and instructs the others to carry it out. The workers report what they can do and at what cost; the coordinator solves the resulting optimization and pushes assignments back. This is the structure of the orchestrator pattern that dominates today's large-language-model agent frameworks: one planner model reads the task, decomposes it into subtasks, and dispatches each to a tool or a subordinate executor. Because the coordinator sees everything, it can in principle compute a globally optimal plan, and because the logic lives in one place, the system is simple to reason about, debug, and observe. There is one log, one decision-maker, one place to look when something goes wrong.

The cost of that simplicity is structural. Every decision flows through one agent, so the coordinator's throughput caps the whole system's, exactly the bottleneck that Chapter 2 identified for any single point of coordination, and gathering global state before each decision is itself a communication round that grows with the number of agents. Worse, if the coordinator fails, the system has no plan and no way to make one; the workers were never given the authority or the information to continue. Centralization concentrates both capability and fragility in the same node.

Key Insight: The Coordinator Buys Optimality with a Bottleneck

A central coordinator can be globally optimal because it sees global state, and it is simple because all the deciding happens in one place. Both properties come from the same fact, that one agent holds the authority, and that same fact makes it a throughput bottleneck and a single point of failure. You do not get to keep the optimality and the simplicity while discarding the bottleneck and the fragility; they are different views of one design choice. The engineering question is never "is centralization good?" but "is the global view worth the central dependency for this workload?"

2. Decentralized: Peers Coordinate with Local Information Intermediate

In a decentralized architecture there is no coordinator. Each agent knows only its own state and what its neighbors tell it, and the global allocation emerges from many local interactions: agents announce tasks, bid on the ones they can do cheaply, and award them by a local rule, with no node ever holding the whole picture. This is the world of the Contract-Net protocol that Section 27.5 formalizes, of the swarms in Chapter 31 where simple peers produce complex collective behavior, and of gossip-style coordination where information diffuses through pairwise exchange rather than a central broadcast. Because no single node is essential, the loss of any one agent degrades the system gracefully rather than halting it, and because work is shared by exchanging messages with neighbors rather than funneling through one node, the design scales to far more agents than a single coordinator could serve.

What decentralization gives up is the global view, and with it the easy guarantees. No agent can certify that the emergent allocation is globally optimal, because no agent ever sees the global cost; the system can settle into a locally sensible but globally suboptimal arrangement. Coordination also costs more messages, since agreement that a coordinator could reach by reading shared state must instead be negotiated through rounds of announcements and bids. Observability suffers too: there is no single log of "the plan", only a distributed state scattered across agents that must be reconstructed to be understood. These are the reasoning-level forms of the resilience-for-coherence trade that gossip averaging made at the data level in Section 1.4.

Fun Note: Nobody Is in Charge, and That Is the Feature

Ask a centralized system "who decided this?" and it points at the coordinator. Ask a decentralized swarm the same question and the honest answer is "the protocol did, a little bit at each agent." This is unsettling the first time you debug one: there is no single place where the decision was made, only a sequence of local agreements whose sum is the global outcome. The upside is that there is also no single place where the decision can fail to be made.

3. Hybrid: Hierarchies of Coordinators Intermediate

Real systems rarely sit at either pole. The common practical middle is hierarchical: local coordinators each manage a small team with the simplicity and near-optimality of centralization within their group, and a global coordinator composes the teams without having to plan for every agent directly. A global planner that had to assign work to a thousand agents one by one would be an impossible bottleneck; a thousand peers negotiating pairwise would drown in messages and never converge to anything coherent. A two-level hierarchy of, say, ten local coordinators each managing a hundred agents keeps each coordinator's problem small while preserving a single point of global coherence. This is exactly the hierarchical agent-team structure that Chapter 32 builds for orchestrating large fleets of language-model agents, and it mirrors the hierarchical-then-flat collective topologies that made all-reduce scale in Chapter 4.

The hybrid design is a tunable dial, not a fixed point. Push authority up and you approach centralization: stronger global guarantees, weaker fault tolerance. Push it down and you approach decentralization: more resilience and scale, weaker global coherence. Choosing where to set the dial is the architectural decision, and it is made per workload by asking which failures you can tolerate and which guarantees you cannot do without.

Thesis Thread: The Coordinator Spectrum Returns at the Reasoning Level

The centralized-decentralized-hybrid spectrum is one of the book's signature arcs, and this section is where it crosses from data to decisions. It was introduced for combining gradients in Section 1.4 (parameter server, all-reduce, gossip), grounded in the consensus and bottleneck analysis of Chapter 2, and it now organizes how agents combine intentions. It will return once more as the orchestration topology of Chapter 32. Whenever a later chapter asks "who decides?", it is setting a dial on this same spectrum; the trade-offs never change, only the unit being coordinated.

4. The Same Problem, Two Architectures Intermediate

To make the trade-offs concrete rather than rhetorical, we solve one task-allocation problem two ways and measure what differs. There are $K$ agents and $M$ tasks; assigning task $t$ to agent $a$ costs $c_{a,t}$, and we want an assignment $\pi$ minimizing the total,

$$\min_{\pi}\; \sum_{t=1}^{M} c_{\pi(t),\, t}, \qquad \pi(t) \in \{1, \dots, K\}.$$

The centralized coordinator reads the whole cost matrix and picks, for each task, the agent with the lowest cost, a global argmin per column that is optimal for this unconstrained version. The decentralized peers never see the matrix; for each task they run a Contract-Net round, announcing the task, collecting one bid per agent, and awarding it to the lowest bidder. With no capacity limits the two converge to the identical allocation, which is exactly what lets us isolate the real differences: how many messages each spends, and what happens when an agent fails. The code below runs both and then kills agent 2 to test robustness.

import numpy as np

# A task-allocation problem: M tasks must each be assigned to exactly one of
# K agents. Cost c[a, t] is what agent a pays to do task t. We minimize total
# cost. We solve it two ways and compare quality, messages, and robustness.

rng = np.random.default_rng(7)
K, M = 5, 12                                   # agents, tasks
cost = rng.integers(1, 100, size=(K, M))       # cost[a, t]: agent a doing task t


def total_cost(assign, cost, dead=None):
    # assign[t] = agent index handling task t; dead agents cannot count.
    tot, unassigned = 0, 0
    for t, a in enumerate(assign):
        if a is None or (dead is not None and a == dead):
            unassigned += 1
        else:
            tot += cost[a, t]
    return tot, unassigned


# ---- Centralized: one coordinator sees the full cost matrix and optimizes. ----
def centralized(cost):
    K, M = cost.shape
    assign = [int(np.argmin(cost[:, t])) for t in range(M)]   # global argmin per task
    msgs = K + M                  # K agents report costs in, M assignments sent out
    return assign, msgs


# ---- Decentralized: peers bid via a Contract-Net-style round, local info only. ----
def decentralized(cost, dead=None):
    K, M = cost.shape
    assign = [None] * M
    msgs = 0
    for t in range(M):
        # Announce task t to all agents; each alive agent replies with one bid.
        bids = {}
        for a in range(K):
            if dead is not None and a == dead:
                continue
            msgs += 2                          # announce + bid for this (task, agent)
            bids[a] = cost[a, t]
        if bids:
            winner = min(bids, key=bids.get)   # lowest local bid wins; award message
            msgs += 1
            assign[t] = winner
    return assign, msgs


c_assign, c_msgs = centralized(cost)
c_cost, _ = total_cost(c_assign, cost)
d_assign, d_msgs = decentralized(cost)
d_cost, _ = total_cost(d_assign, cost)

print("agents K, tasks M       :", K, M)
print("centralized total cost  :", c_cost, " messages:", c_msgs)
print("decentralized total cost:", d_cost, " messages:", d_msgs)
print("same allocation         :", c_assign == d_assign)

# ---- Robustness: agent 2 fails AFTER the coordinator already assigned work. ----
dead = 2
c_after, c_un = total_cost(c_assign, cost, dead=dead)
# Decentralized re-runs its bidding round with the failed agent simply absent.
d_assign2, d_msgs2 = decentralized(cost, dead=dead)
d_after, d_un = total_cost(d_assign2, cost, dead=dead)

print("\n-- agent", dead, "fails --")
print("centralized   : tasks dropped =", c_un, " (frozen plan, no re-bid)")
print("decentralized : tasks dropped =", d_un,
      " recovered cost =", d_after, " extra messages:", d_msgs2)
Code 27.3.1: One task-allocation problem solved by a centralized coordinator (global argmin over the full cost matrix) and by decentralized peers (a per-task Contract-Net bidding round using only local costs), instrumented to count messages and to test what each does when an agent fails mid-run.
agents K, tasks M       : 5 12
centralized total cost  : 223  messages: 17
decentralized total cost: 223  messages: 132

-- agent 2 fails --
centralized   : tasks dropped = 1  (frozen plan, no re-bid)
decentralized : tasks dropped = 0  recovered cost = 229  extra messages: 108
Output 27.3.1: Both architectures reach the identical optimal cost of 223, so quality is a wash here. The difference is everything else: the decentralized protocol spends 132 messages to the coordinator's 17 (nearly eight times as many), but when agent 2 dies the frozen central plan drops a task it can no longer execute, while the decentralized peers simply re-bid without the dead agent and finish all twelve tasks at a slightly higher cost of 229.

The numbers tell the whole story of the section. Solution quality was identical because the problem was easy and the global view bought nothing the local bids could not find, which is common: centralization's optimality edge only matters when constraints couple the tasks so that no greedy local rule suffices. The message cost was where the architectures parted, the decentralized protocol paying a large multiple for coordinating through negotiation rather than reading shared state. And robustness inverted the picture: the very central plan that was cheap to compute became a liability the moment its assumptions changed, because no worker had the authority or information to repair it, whereas the decentralized system treated a failed agent as just one fewer bidder. Faster and cheaper to plan, but brittle, versus costlier to coordinate, but self-healing, is the trade you are always making.

Practical Example: The Orchestrator That Became the Outage

Who: A platform team running a customer-support system built from language-model agents.

Situation: A single orchestrator agent read each incoming ticket, planned the steps, and dispatched them to specialist executor agents (billing, refunds, account recovery).

Problem: The orchestrator was the only node that knew the plan, and under a traffic spike it became the throughput ceiling for the whole fleet; when it crashed, every in-flight ticket stalled because no executor knew what came next.

Dilemma: Keep the clean, observable central orchestrator and accept that it is a bottleneck and a single point of failure, or move to peer executors that pass work directly and lose the single plan-level log the on-call team relied on.

Decision: They went hybrid rather than to either pole, splitting tickets across several local orchestrators, each owning a category, under a thin global router.

How: The global router did only coarse category routing (cheap, low state), while each local orchestrator planned within its category and could fail and restart without touching the others, mirroring the hierarchy in Figure 27.3.1.

Result: Peak throughput rose because no single planner gated all traffic, a crashed local orchestrator took down only its own category, and the team kept a per-category plan log instead of one global one, an observability cost they judged worth paying.

Lesson: The fix for a coordinator bottleneck is usually not full decentralization but moving the dial: enough hierarchy to remove the single chokepoint while keeping coordination problems small enough to stay near-optimal and observable.

5. Observability and the Cost of a Global View Advanced

A theme that the experiment surfaces but does not fully name is observability, the practical cost of knowing what the system is doing. Centralization makes observability nearly free, because the coordinator already holds the global state that a monitor wants to read; one log captures the plan, and one query answers "what is the system doing?" Decentralization scatters that state across agents, so reconstructing a global view becomes its own distributed computation, with the same staleness and consistency caveats that Chapter 2 attached to any global snapshot. This is why decentralized systems are harder not only to guarantee but to debug: the information needed to understand them is exactly the information they were designed not to centralize.

The cost of a global view, then, is paid in one of two currencies. A centralized system pays it up front and continuously, as the communication and bottleneck cost of gathering global state before every decision. A decentralized system avoids that standing cost but pays a lump sum whenever it actually needs the global view, for monitoring, for a global guarantee, or for a human to understand what happened. Hybrid architectures let you choose how much of each currency to spend, which is the deeper reason they dominate practice: most systems need a strong-enough global view often enough that pure decentralization is inconvenient, but not so constantly that pure centralization is affordable.

Library Shortcut: Coordinator and Peer Topologies in LangGraph

Code 27.3.1 hand-rolled both the central argmin and the peer bidding loop to expose the message counts. In practice, agent frameworks let you declare the topology and supply the coordination. A centralized planner-executor and a decentralized peer handoff differ only in how you wire the graph:

from langgraph.graph import StateGraph, END

# Centralized: every executor edge returns to the coordinator, which re-plans.
g = StateGraph(State)
g.add_node("coordinator", plan_and_assign)        # one node holds the plan
for worker in ("billing", "refunds", "recovery"):
    g.add_node(worker, run_worker)
    g.add_edge(worker, "coordinator")             # report back to the center
g.add_conditional_edges("coordinator", route_or_finish)

# Decentralized: peers hand off to each other directly, no return-to-center.
g2 = StateGraph(State)
for peer in ("billing", "refunds", "recovery"):
    g2.add_node(peer, run_peer_with_handoff)      # each peer decides the next hop
    g2.add_conditional_edges(peer, pick_next_peer_or_end)
Code 27.3.2: The same two architectures as a wiring choice in LangGraph. The framework handles state passing, message routing, retries, and checkpointing; the only thing you choose is whether edges return to a coordinator (centralized) or flow peer-to-peer (decentralized), the dozens of lines of protocol bookkeeping in Code 27.3.1 collapsing to the graph's edge structure.
Research Frontier: Centralized Versus Decentralized LLM-Agent Topologies (2024 to 2026)

The centralized-decentralized question has become an active empirical topic for language-model agents now that fleets of them solve real tasks. Microsoft's AutoGen and its AG2 successor formalize centralized group-chat managers alongside decentralized peer conversations, and studies of multi-agent debate and society-of-mind setups report that decentralized peer exchange can beat a single planner on reasoning quality while costing many more model calls, the message-cost trade of Output 27.3.1 reappearing as a token-cost trade. Work on hierarchical and graph-structured agent organizations (for example MetaGPT's role hierarchy in 2024 and a wave of 2024 to 2026 papers analyzing how agent communication topology shapes accuracy, robustness, and cost) is essentially mapping this same spectrum experimentally, asking which topology is worth its coordination overhead for which task. The open frontier is adaptive topology: systems that move along the dial at run time, centralizing when a global guarantee is needed and decentralizing when resilience and parallelism matter more. We build the hierarchical end of this spectrum in Chapter 32.

We now have the control-architecture spectrum for intelligence, the matching trade-offs from distributed systems, and a measured sense of what centralization and decentralization actually cost. One specific coordination pattern deserves its own treatment, because it organizes decentralized agents around a shared structure rather than around a coordinator or pure peer messaging: the blackboard, where agents read and write a common workspace and the global solution accretes on it. That architecture is the subject of Section 27.4.

Exercise 27.3.1: Where Does the Dial Sit? Conceptual

For each system, place it on the centralized-decentralized-hybrid spectrum of Figure 27.3.1 and justify the placement by naming the dominant pressure (global optimality, throughput, fault tolerance, or observability): (a) a chess engine that explores one search tree on one machine; (b) a fleet of warehouse robots that must keep moving even if the warehouse controller reboots; (c) a language-model agent app where one planner decomposes a user request and calls tools; (d) a sensor network estimating a field where any node may drop offline. For each, state which single failure or scaling pressure would force you to move the dial, and in which direction.

Exercise 27.3.2: Make Centralization Actually Win Coding

In Code 27.3.1 both architectures reached the identical cost because each task was independent. Add a capacity constraint so that each agent may take at most $\lceil M/K \rceil$ tasks. Re-run the greedy decentralized bidding (lowest bidder wins, but a full agent must drop out) and compare its total cost to a centralized solver that respects the same constraint (use scipy.optimize.linear_sum_assignment on a suitably tiled cost matrix, or any balanced-assignment method). Report the cost gap and explain why coupling the tasks through a shared constraint is exactly what gives the global view its advantage.

Exercise 27.3.3: Price the Two Currencies of a Global View Analysis

Using Section 5's framing, model the total coordination cost of each architecture as a function of agent count $K$ for a stream of $T$ decisions. Assume the centralized coordinator gathers global state at cost proportional to $K$ before each of the $T$ decisions, while the decentralized peers pay a per-decision negotiation cost proportional to $K$ but additionally must pay a one-time reconstruction cost proportional to $K$ each time an external monitor (arriving $m$ times) needs a global snapshot. Write both totals, find the crossover in $T$, $K$, and $m$ at which decentralization becomes cheaper, and connect the result to why hybrid architectures dominate when monitoring is frequent but not constant.