"They call me an agent now, which is generous. I am a process that holds some context, waits on a model I do not own, and pokes the world through tools. Put three of me in a room and you have a distributed system, not a conversation."
An Agent Stuck Waiting on a Lock
An LLM agent is not a clever prompt; it is a stateful distributed component that perceives its context, reasons by calling a model served on a remote fleet, and acts through tools and messages, and a system of such agents is a distributed system in the full sense of this book. The moment you place a second agent beside the first, every concern that organized the previous thirty-one chapters returns at once: the agents must communicate, and each message is an expensive model call rather than a cheap packet; they must coordinate, so the overhead and the protocols of Chapter 29 apply directly; they must share state, so consistency becomes a question; and they will fail, so reliability becomes a design problem. This section establishes the reframe that drives the whole chapter: orchestrating agents well is a distributed-systems discipline. It builds a single agent from first principles as a perceive-reason-act loop so the component is concrete, then names the system-level concerns that the rest of the chapter develops.
The classical agent of Part VI was an abstraction: an entity that perceives its environment, decides what to do, and acts, possibly alongside other agents that it must coordinate with. Chapter 27 framed distributed artificial intelligence around this idea, and Section 29.1 made the perceive-decide-act cycle the unit of a multi-agent system. For most of that history the "decide" step was hand-written: a rule base, a planner, a learned policy. What changed, and what this chapter is about, is that the decide step is now a call to a large language model. That single substitution turns the agent from a self-contained program into a distributed component, because the reasoning no longer happens inside the agent at all. It happens on the distributed LLM-serving fleet of Chapter 24, across the network, on machines the agent does not own and cannot see.
This is the through-line of the chapter. Tool use and function calling (Section 32.2) give the agent its actuators. Roles, planning, and the planner-executor split (Section 32.3) decompose the work. Parallel and distributed workflows (Section 32.4) place agents across machines. Multi-agent reasoning patterns such as debate and reflection (Section 32.5) make several agents argue toward a better answer. Communication protocols (Section 32.6) carry their messages. Shared and distributed memory (Section 32.7) holds their common state. Orchestration engines (Section 32.8) run the whole graph. Evaluation (Section 32.9) measures whether it works, and operating at scale (Section 32.10) confronts the cost, latency, and reliability that distribution imposes. Every one of these is a distributed-systems topic wearing an agent costume. The figure below shows the costume and the system underneath it.
1. The Agent Is a Stateful Service, Not a Prompt Beginner
A useful first move is to stop thinking of an agent as a block of prompt text and start thinking of it as a service with an address, some state, and a request-response loop. The state is the agent's context: the running transcript of what it has perceived, what it has decided, and what its tools returned. A request arrives (a task, or a message from another agent), the agent assembles its context into a prompt, sends that prompt to the model, receives a decision, and either acts on the world through a tool or emits a message to another agent. Then it waits for the next input. This is precisely the shape of any networked service, and it is precisely the perceive-decide-act cycle of Section 29.1, with the decision delegated to a remote model.
Seeing the agent as a service immediately surfaces the distributed-systems questions. Where does the state live, and what happens to it if the process restarts? How long does the reasoning step block, and what is its tail latency, given that it is a call to a shared fleet with its own queue? What does the agent do when the model call times out or returns malformed output? These are not prompting questions; they are the questions Chapter 2 asked about any distributed component, and they have the same answers here. The agent is a component, the reasoning is a remote procedure call, and the rest follows.
What makes an LLM agent distributed is not that you can run several of them. It is that even a single agent's core operation, deciding what to do, is performed on a machine the agent does not own. Every reasoning step is a network round trip to the serving fleet of Chapter 24, with that fleet's latency, queueing, batching, and failure behavior. Once you accept that the "decide" in perceive-decide-act is a remote procedure call, the agent's cost model, latency budget, and failure handling are dictated by distributed-systems facts, not by prompt wording. This is why the chapter treats orchestration as systems engineering and not as prompt craft.
2. A Single Agent, Built From Scratch Beginner
To make the component concrete before we connect many of them, we build the smallest honest agent: a perceive-reason-act loop that calls a model and one tool to finish a small task. The reasoning step is a stub that returns canned decisions so the loop runs offline and deterministically; mentally, that stub stands in for the network call to the serving fleet, and nothing in the loop changes when you swap the stub for a real client. The tool is a pure calculator. The task is to compute an order total. Watching the loop turn perceptions into actions is the whole point: it shows that an agent is a control loop around a reasoning call, and the rest of the chapter is about distributing that loop.
"""A single LLM agent as a perceive-reason-act loop. No network, no real LLM.
The LLM is a deterministic stub that returns canned decisions, and the one tool
is a pure-Python calculator. This makes the agent-as-component concrete."""
import json
def stub_llm(messages):
"""A stand-in for a call to the distributed LLM-serving fleet (Chapter 24).
Real systems send `messages` over the network and wait; here we pattern-match
the latest observation and return a canned decision, so the loop is offline."""
last = messages[-1]["content"]
if "TASK:" in last:
return {"thought": "I must compute the total, so I will call the tool.",
"action": "calculator", "args": {"expr": "(149.99 * 3) + 24.50"}}
if "TOOL_RESULT:" in last:
value = last.split("TOOL_RESULT:")[1].strip()
return {"thought": "I have the number; I can answer now.",
"action": "finish", "args": {"answer": f"The order total is {value}."}}
return {"thought": "Unclear state.", "action": "finish", "args": {"answer": "?"}}
def calculator(expr):
"""The single tool this agent can act with: a pure, side-effect-free function.
In a real system a tool is often a remote service behind its own network call."""
return str(round(eval(expr, {"__builtins__": {}}, {}), 2))
TOOLS = {"calculator": calculator}
def run_agent(task, max_steps=4):
"""The perceive-reason-act loop of Chapter 29, made into a service component."""
messages = [{"role": "user", "content": f"TASK: {task}"}]
for step in range(1, max_steps + 1):
decision = stub_llm(messages) # REASON (the LLM call)
print(f"step {step}: thought -> {decision['thought']}")
action, args = decision["action"], decision["args"]
if action == "finish": # ACT (terminal)
print(f"step {step}: finish -> {args['answer']}")
return args["answer"]
result = TOOLS[action](**args) # ACT (tool call)
print(f"step {step}: action -> {action}({json.dumps(args)}) = {result}")
messages.append({"role": "assistant", "content": json.dumps(decision)})
messages.append({"role": "user", "content": f"TOOL_RESULT: {result}"}) # PERCEIVE
return None
if __name__ == "__main__":
run_agent("Three items at 149.99 plus 24.50 shipping; what is the order total?")
stub_llm function is the only stand-in for the network; run_agent is the real control loop, and it would be unchanged if stub_llm were replaced by a client to the serving fleet of Chapter 24.step 1: thought -> I must compute the total, so I will call the tool.
step 1: action -> calculator({"expr": "(149.99 * 3) + 24.50"}) = 474.47
step 2: thought -> I have the number; I can answer now.
step 2: finish -> The order total is 474.47.
Three things in Code 32.1.1 are worth dwelling on because they become the chapter's themes. First, the agent never computes anything itself except by acting through a tool; its only native ability is to assemble context and consult the model, which is why tool use (Section 32.2) is the agent's hands. Second, the loop is bounded by max_steps, because a reasoning call can fail to terminate, and a real orchestrator must cap and supervise these loops, a reliability concern (Section 32.10). Third, every iteration appends to messages, so the context grows; in a multi-agent system that growing context is shared state that several agents may read and write, which is the consistency problem of distributed memory (Section 32.7). One small loop already contains the seeds of communication, coordination, consistency, and reliability.
The stubbed model in Code 32.1.1 is the most reliable language model in this book: it never hallucinates, never rate-limits, and answers in zero milliseconds. The instant you replace it with a real one, you inherit a queue, a token bill, and an occasional confident wrong answer. Every difficulty in this chapter is, in some sense, the price of that one substitution.
3. Why Many Agents Make a Distributed System Intermediate
Place a second agent beside the first and the four classical concerns of distributed computing arrive together. Consider them in turn, because the chapter is organized around them. Communication is the first and the most unusual. In the systems of earlier parts a message was a packet, costed in microseconds and bytes; here a single message from one agent to another typically triggers a reasoning call, so its cost is a full LLM inference, costed in hundreds of milliseconds and in dollars. Communication, which the whole book has taught you to minimize, is now the dominant expense, and an agent graph with many edges is a graph with many expensive calls. If an agent makes $r$ reasoning calls to complete its part and there are $A$ agents, a naive all-to-all discussion can cost on the order of
$$C \approx A \cdot r \cdot c_{\text{call}}, \qquad r \text{ growing with the number of conversational rounds},$$where $c_{\text{call}}$ is the cost of one model call. Doubling the number of agents or the number of rounds does not split work the way data parallelism did in Section 1.1; it multiplies cost. That asymmetry, more agents meaning more cost rather than less time, is why Section 32.10 treats topology as a budget decision.
Coordination is the second concern. Several agents working on one task must agree on who does what and in what order, which is the coordination problem of Chapter 29 and the collective-behavior problem of Chapter 31, now with reasoning agents as the participants. Consistency is the third: when agents share a scratchpad, a plan, or a memory store, they can read stale versions and act on them, the same staleness that Chapter 2 analyzed for any replicated state. Reliability is the fourth: any agent's reasoning call can time out or return garbage, any tool can fail, and the orchestrator must detect, retry, and contain these failures, exactly the discipline of Chapter 35. None of these four is new. What is new is that the participants are LLM agents and the messages are reasoning calls, which changes the constants but not the questions.
The six axes of distribution from Section 1.1 ended with "distribute intelligence", and this chapter is where that final axis becomes concrete machinery. An agent's reasoning rides on distributed inference (Part V); its many instances coordinate using the multi-agent ideas of this part (Chapters 27 to 31); and the whole graph runs on the cluster and reliability infrastructure of Part VII. Distributed agent orchestration is therefore not a new island; it is the point where five earlier axes meet on top of the sixth. When you read the later sections, keep asking which earlier axis each one leans on. The answer is always at least one.
4. Orchestration Is a Distributed-Systems Discipline Intermediate
The central claim of this chapter, and the reason it sits in a distributed-systems book rather than a prompting guide, is that the quality of a multi-agent system is governed by its system design and not by the wording of its prompts. Two teams can give their agents identical prompts and reach opposite outcomes because one designed the communication topology, the state-sharing scheme, and the failure handling deliberately and the other left them implicit. A star topology with a single coordinator has different cost, latency, and failure characteristics than a fully connected debate; a shared blackboard has different consistency guarantees than per-agent private memory; a pipeline of specialists has different tail latency than a parallel ensemble. These are architecture choices, and they are the choices this book has spent thirty-one chapters teaching you to make.
This reframe pays off immediately in practice. When a multi-agent system is slow, the fix is usually not a better prompt but fewer reasoning rounds or more parallelism across agents. When it is expensive, the fix is a sparser communication topology, not a shorter system message. When it gives inconsistent answers, the cause is often two agents acting on divergent copies of shared state, a consistency bug, not a reasoning failure. Diagnosing agent systems with a distributed-systems vocabulary, latency and throughput, topology and consistency, retries and timeouts, is what makes them debuggable. The modern frameworks recognize this explicitly: they model an agent system as a graph of components with typed edges and shared state, which is to say, as a distributed system.
A cluster of frameworks now treats multi-agent orchestration as explicit system construction rather than prompt chaining. Microsoft's AutoGen (Wu et al., 2023) models a system as conversable agents exchanging messages, making the communication graph a first-class object. LangGraph (LangChain, 2024) goes further and represents the system as a directed graph with persistent shared state and checkpointing, so an agent workflow becomes a stateful, recoverable computation in the sense of Chapter 2. CrewAI (2024) organizes agents into role-specialized crews with explicit task delegation, and OpenAI's Swarm, succeeded by the Agents SDK (2024 to 2025), centers on lightweight handoffs between agents as the coordination primitive. The convergence is telling: independent teams arrived at graphs, messages, shared state, and handoffs, the vocabulary of distributed systems, because that is what the problem is. We return to these engines as orchestrators in Section 32.8.
The hand-written control loop in Code 32.1.1 is what an orchestration framework gives you as a primitive. In LangGraph you declare the agent as a node, the tool as a node, and a conditional edge that routes back to reasoning until the agent finishes; the framework supplies the shared state object, the loop, and persistent checkpoints for free:
# pip install langgraph
from langgraph.graph import StateGraph, END
g = StateGraph(dict) # shared state flows along the edges
g.add_node("reason", call_model) # the remote LLM step (our stub_llm)
g.add_node("act", run_tool) # the tool step (our calculator)
g.set_entry_point("reason")
g.add_conditional_edges("reason", # route on the model's decision
lambda s: "act" if s["action"] != "finish" else END)
g.add_edge("act", "reason") # loop back to reason after acting
app = g.compile(checkpointer=saver) # persistence and recovery for free
Who: A platform engineer at a SaaS company building an automated support-resolution system from LLM agents.
Situation: A five-agent pipeline (classify, retrieve, draft, review, send) answered tickets correctly in testing but cost forty cents per ticket and took eighteen seconds at the median, with a long tail past a minute.
Problem: The team's first instinct was to shorten and sharpen the prompts, assuming the latency and cost were a wording problem; two weeks of prompt tuning moved neither number meaningfully.
Dilemma: Keep optimizing prompts, the familiar lever, or step back and treat the pipeline as a distributed system whose topology and call count, not its wording, set the cost and latency.
Decision: They reframed it as a systems problem. Each agent was a remote reasoning call, so cost and latency were dominated by the number of calls and their critical-path length, not by token wording.
How: They ran classify and retrieve in parallel rather than in series, merged review into the draft step's single call with a self-check instruction, and cached retrieval so repeat tickets skipped a call entirely, reducing the critical path from five sequential calls to two.
Result: Median latency fell from eighteen to seven seconds, cost from forty to seventeen cents per ticket, and answer quality was unchanged, because nothing about the reasoning had been touched, only the system around it.
Lesson: When an agent system is slow or expensive, the lever is almost always the topology and the call count, not the prompt. Orchestration is a distributed-systems discipline, and it is debugged with a distributed-systems vocabulary.
We now have the chapter's foundation: the agent as a stateful distributed component whose reasoning is a remote procedure call (Section 1 and 2), the four classical concerns that arrive the moment a second agent appears (Section 3), and the reframe that makes the rest of the chapter coherent, that orchestrating agents is systems engineering (Section 4). The next section gives the agent its hands, examining tool use and function calling as the mechanism by which an agent's reasoning reaches out and acts on the world. That mechanism, and the distributed calls it triggers, begins in Section 32.2.
Take the single agent of Code 32.1.1 and describe it as a networked service using the vocabulary of Chapter 2. Identify its state, its request and response, the one operation that is a remote procedure call, and at least three distinct failure modes that the in-memory stub hides but a real deployment would face. For each failure mode, name where in the chapter (by section topic) it is addressed.
Extend Code 32.1.1 so that run_agent returns the number of reasoning calls it made. Then write a small driver that runs three agents whose outputs feed each other in a chain (agent 1's answer becomes agent 2's task, and so on), and report the total reasoning-call count for the chain. Now change the chain into a round of debate where all three agents see each other's first answer and reason once more, and recount. Relate the two totals to the cost expression $C \approx A \cdot r \cdot c_{\text{call}}$ from Section 3, and state which topology you would choose under a fixed call budget.
Suppose a single reasoning call costs $c_{\text{call}} = 0.02$ dollars and takes $250$ milliseconds, and tool calls are free and instantaneous. For a task that needs $12$ reasoning calls total, compare two designs: (a) one agent making all $12$ calls sequentially, and (b) four agents making $3$ calls each, fully in parallel, plus one final coordinator call to merge. Compute the dollar cost and the critical-path latency of each. Explain why parallelism across agents lowers latency but not cost, contrasting this with the data-parallel speedup of Section 1.1, and state the condition under which design (b) is worth its extra coordinator call.