Part VI: Distributed AI and Multi-Agent Systems
Chapter 29: Multi-Agent Systems

What Is an Agent?

"They keep asking what I am. I perceive, I decide, I act, and I do it without anyone holding my hand. The rest is just argument about how many of me fit in a cluster."

An Agent Stuck Waiting on a Lock
Big Picture

An agent is the atom of every multi-agent system: an entity that perceives its environment, decides what to do, and acts back on that environment, all in pursuit of its own goals and without a controller dictating each step. Once you can name that atom precisely, the rest of this part of the book becomes a study of what happens when many of them share a world and must communicate, coordinate, compete, and agree. This section pins down the agent as a function from what it has seen to what it does next, names the four properties that separate an agent from an ordinary program, and shows that a modern language-model agent is exactly this classical loop wearing new clothes. That single definition is the unit on which Chapters 29 through 32 are built.

Every previous part of this book distributed a mechanical activity: data across storage nodes, gradients across training workers, a model across accelerators, requests across a serving fleet. Part VI distributes something harder to pin down, namely decision-making itself. To distribute decision-making we first need a precise unit of decision-making to distribute, and that unit is the agent. In Chapter 27 we saw distributed problem solving from the system's point of view; here we zoom all the way in to the single participant and ask what, exactly, makes it an agent rather than a function call or a thread.

The answer has been stable for decades and has suddenly become practical at industrial scale. An agent is anything that perceives its environment through sensors and acts upon that environment through actuators, choosing its actions so as to advance its goals. A thermostat is a minimal agent: it senses temperature and acts on a heater. A trading bot is an agent: it senses a market and acts by placing orders. A language-model assistant that reads a context window, reasons, and calls tools is an agent in precisely the same sense. The shared structure under all three is a single loop, repeated forever, that we examine next.

1. The Perceive-Decide-Act Loop Beginner

An agent runs one loop. It perceives the current state of its environment, it decides on an action by reasoning over what it has perceived, and it acts, changing the environment, which it then perceives again. Perception, decision, action, repeat. The thermostat does this once a second; a robot does it many times a second; a language-model agent does it once per turn. The loop is the same shape regardless of how sophisticated the deciding step is, and naming its three stages cleanly is the whole point of this section, because every later mechanism in this part of the book, from negotiation to consensus, is something that happens between the act of one agent and the perception of another. Figure 29.1.1 shows the loop for a single agent embedded in its environment, and on the right shows the move that defines the rest of Part VI: many such loops sharing one world.

A single agent in its environment Environment Perceive Decide Act sensors goal-directed reasoning actuators the loop repeats: each action reshapes what the agent perceives next Many agents, one shared world Shared environment Agent A Agent B Agent C one agent's action becomes another agent's percept
Figure 29.1.1: The agent and its multiplication. On the left, one agent runs the perceive-decide-act loop: its sensors read the environment, its reasoning chooses an action, its actuators change the environment, and the changed environment feeds the next perception. On the right, the move that defines this part of the book: place several such agents in one shared environment and each agent's action becomes part of what the others perceive, which is the origin of every coordination problem in Chapters 29 through 32.

The three stages map onto three questions an agent designer must answer separately. What can the agent sense, and how faithfully? How does it turn what it has sensed into a choice? What is it able to do, and how does each action change the world? Keeping these three concerns apart is the first discipline of agent design, because a weakness in any one of them, a blind sensor, a confused policy, or a clumsy actuator, limits the whole agent regardless of how strong the other two are.

Key Insight: An Agent Is a Function From Percept Histories to Actions

The cleanest formal definition strips away the implementation entirely. Let $P$ be the set of possible percepts and $A$ the set of possible actions. An agent is a function $f : P^{*} \to A$ that maps any finite sequence of percepts seen so far, the percept history, to the next action. Writing the history as $p_{1}, p_{2}, \ldots, p_{t}$, the agent emits $a_{t} = f(p_{1}, \ldots, p_{t})$. This says something strong: an agent's entire observable behavior is determined by what it has perceived, not by anything outside the loop. Internal memory, a learned model, a planning tree, or a language model are all just ways of computing $f$ compactly without storing the full history. Everything else in this section, autonomy, reactivity, the goal, is a property of how $f$ is shaped, not a departure from this form.

2. The Agent-Environment Boundary Beginner

The function $f$ only makes sense once we draw a line around the agent and call everything outside it the environment. That line, the agent-environment boundary, is a modeling choice rather than a law of nature, and choosing it well is half of agent design. Everything inside the boundary is the agent: its sensors, its decision procedure, its actuators, its memory. Everything outside is the environment: the world the agent perceives and acts upon, which crucially includes the other agents. The boundary is exactly the interface across which percepts flow inward and actions flow outward, and nothing else crosses it. An agent cannot reach into the environment except by acting, and cannot learn about it except by perceiving.

Where we place the boundary determines what counts as a percept and what counts as an action. Draw it around a single robot and the wheels are actuators; draw it around a warehouse of robots and a whole robot is one actuator of the larger system. This relativity is why the same physical setup can be modeled as one agent or many, and the decision to treat a system as multi-agent rather than as one monolithic agent is the founding choice of this entire part of the book. We model a system as many agents precisely when its components have their own goals and decide autonomously, because then no single function $f$ captures the whole and the interesting behavior lives in the interactions across boundaries.

Fun Note: Your Code Reviewer Is in the Environment

If you build a coding agent, the test suite it runs against is part of its environment, and so is the human who approves its pull request. From the agent's side of the boundary, a failing test and a frowning reviewer are both just percepts that arrive after an action. The agent never sees their inner workings, only the verdicts that cross the boundary, which is exactly why a good agent spends so much effort guessing what is on the other side.

3. The Four Properties of Agency Intermediate

Not every program that maps inputs to outputs deserves the name agent. The distinction is captured by four properties, and a system is usefully called an agent when it exhibits all four to some degree. The first is autonomy: the agent acts without constant external control, deciding its own actions from its own state rather than executing a script handed to it step by step. The second is reactivity: the agent perceives its environment and responds to changes in it in a timely way, rather than running open-loop on assumptions baked in at design time. The third is proactiveness: the agent does not merely react, it takes initiative, generating and pursuing goals rather than waiting to be prodded. The fourth is social ability: the agent interacts with other agents, through communication or through shared effects on the environment, to achieve what it cannot achieve alone.

These four pull in tension, which is what makes agent design interesting. A purely reactive agent responds fast but pursues nothing; a purely proactive agent chases its goal but ignores a world that has changed under it. A good agent balances reactivity and proactiveness, staying responsive to surprises while still driving toward its goal. Social ability is the property that this entire part of the book exists to develop, because it is the only one of the four that is meaningless for an agent alone: it appears only when there are other agents to interact with, which is the move from one agent to many. Table 29.1.1 summarizes the four properties and where each is developed.

Table 29.1.1: The four properties of agency, the question each answers, and where this book develops it. An entity is usefully called an agent when it shows all four to some degree.
PropertyThe question it answersWhere the book develops it
AutonomyDoes it act without being driven step by step?Section 29.2 (agent architectures)
ReactivityDoes it respond to a changing environment in time?Sections 29.2 to 29.3
ProactivenessDoes it take initiative toward its own goals?Sections 29.2, and MARL in Chapter 30
Social abilityDoes it interact with other agents to get more done?Sections 29.4 to 29.10, the heart of this chapter

Distinguishing an agent from an ordinary object or process sharpens all four. An object encapsulates state and exposes methods, but it has no goals: it does what it is told, when a method is called, and never acts on its own initiative, so it lacks autonomy and proactiveness. A process runs concurrently and may react to messages, but it too executes whatever logic it was given, pursuing no goal of its own. The agent's defining addition is goal-directed autonomy: it holds its own objective and selects actions to advance that objective, which is exactly why we cannot model a multi-agent system as a single program with many threads. Threads share one designer's intent; agents each carry their own.

4. Rational Agents and Expected Utility Intermediate

Saying an agent pursues a goal raises the question of what it means to pursue it well. The standard answer is rationality. A rational agent is one that, for every percept history, selects the action that maximizes its expected utility, where utility is a numerical measure of how good outcomes are from the agent's point of view. If the agent has a utility function $u$ over outcomes and a belief, given history and a candidate action $a$, about the probability $P(o \mid a)$ of each outcome $o$, then the rational choice is

$$a^{\star} = \arg\max_{a \in A} \; \mathbb{E}[\,u \mid a\,] = \arg\max_{a \in A} \; \sum_{o} P(o \mid a)\, u(o).$$

This is the same expected-utility principle that underlies decision theory, and it is the bridge to Chapter 28. With one agent, maximizing expected utility is a self-contained optimization: the agent need only model the environment. With many agents, the outcome of one agent's action depends on what the others do at the same time, so each agent is maximizing a utility that other rational maximizers are simultaneously trying to bend. That coupling is precisely a game, and the solution concepts of game theory, equilibria and the rest, are what rationality reduces to when the environment contains other rational agents. Chapter 28 built that machinery; this chapter spends it.

Real agents are rarely perfectly rational, and saying so is not an apology but a design fact. Sensors are noisy, computation is bounded, and the utility function is an imperfect proxy for what we actually want. The notion of a rational agent is a target and a yardstick, not a description: it tells us what the ideal action is, so that we can measure how far a buildable agent falls short and decide whether closing the gap is worth the cost. The agent in the runnable demo below is boundedly rational in exactly this way: it uses a cheap heuristic toward its goal rather than computing the truly optimal path, and it still gets there on its own.

5. A Goal-Seeking Agent From Scratch Beginner

The cleanest way to make the perceive-decide-act loop concrete is to build the smallest agent that genuinely exhibits all of autonomy, reactivity, and proactiveness, then watch it pursue a goal with no controller telling it what to do. The agent below lives in a small gridworld with walls. Crucially, it does not know the maze in advance: at each step it perceives only its four neighboring cells, decides which way to move by a goal-seeking heuristic that prefers unvisited cells, and acts by stepping there. The three methods perceive, decide, and act are the three stages of the loop, kept deliberately separate as Section 1 urged.

# A 6x6 gridworld with walls. '#' is wall, 'G' is the goal, '.' is open.
GRID = ["......", ".####.", ".#..#.", ".#.G#.", ".#..#.", "......"]
H, W = len(GRID), len(GRID[0])
GOAL, START = (3, 3), (0, 0)

def cell(r, c):                                  # what occupies a cell
    if r < 0 or r >= H or c < 0 or c >= W:
        return "#"                               # the world's edge acts as a wall
    return GRID[r][c]

class Agent:
    """Perceives four neighbours, decides toward the goal, acts by stepping."""
    def __init__(self, pos):
        self.pos = pos
        self.visited = {pos}                      # memory of where it has been

    def perceive(self):                           # PERCEIVE: sense the neighbours
        r, c = self.pos
        return {"up": (r-1, c, cell(r-1, c)), "down": (r+1, c, cell(r+1, c)),
                "left": (r, c-1, cell(r, c-1)), "right": (r, c+1, cell(r, c+1))}

    def decide(self, percept):                    # DECIDE: reason toward the goal
        gr, gc = GOAL
        best, best_key = None, None
        for key, (r, c, what) in percept.items():
            if what == "#":
                continue                          # never step into a wall (reactivity)
            if what == "G":
                return key                         # goal in reach: take it at once
            # minimise distance to goal; penalise revisits (proactive exploration)
            score = abs(r-gr) + abs(c-gc) + (5 if (r, c) in self.visited else 0)
            if best is None or score < best:
                best, best_key = score, key
        return best_key

    def act(self, action, percept):               # ACT: move through the actuator
        r, c, _ = percept[action]
        self.pos = (r, c)
        self.visited.add(self.pos)

# The loop: the agent runs it ALONE until it reaches the goal. No controller
# tells it which way to go at any step; that is its autonomy.
agent, trail = Agent(START), [START]
for step in range(1, 41):
    percept = agent.perceive()                    # PERCEIVE
    action = agent.decide(percept)                # DECIDE
    agent.act(action, percept)                    # ACT
    trail.append(agent.pos)
    if agent.pos == GOAL:
        print(f"goal reached at step {step}")
        break

print("steps taken        :", len(trail) - 1)
print("path               :", " to ".join(str(p) for p in trail))
print("distinct cells seen:", len(set(trail)))
Code 29.1.1: A goal-seeking agent in a gridworld, with the perceive-decide-act loop made explicit as three separate methods. The agent perceives only its four neighbors, never the whole maze, and still reaches the goal under its own control.
goal reached at step 10
steps taken        : 10
path               : (0, 0) to (1, 0) to (2, 0) to (3, 0) to (4, 0) to (5, 0) to (5, 1) to (5, 2) to (4, 2) to (3, 2) to (3, 3)
distinct cells seen: 11
Output 29.1.1: The agent reaches the goal in ten steps, navigating around the wall ring by following its own goal-seeking decisions. No external code chose its moves; the path is the trace of its autonomous loop.

Watch how the four properties appear in the run. Autonomy: nothing outside the loop chose the moves listed in Output 29.1.1; the agent picked each one from its own state. Reactivity: when the wall blocked the direct route, the decide method dropped the walled option and rerouted, so the agent goes down and around rather than into the barrier. Proactiveness: the revisit penalty in the score is the agent taking initiative to explore toward its goal rather than oscillating in place. Social ability is the one property absent here, because there is only one agent. Add a second agent to this same grid, perhaps one defending the goal or competing to reach it first, and every coordination question of the chapters ahead appears at once: that step from one loop to many is the subject of Section 29.2 and everything after it.

Library Shortcut: LangGraph and AutoGen Give You the Loop and the Many

Code 29.1.1 hand-builds one agent's loop in about forty lines. Production agent frameworks give you that loop, plus the machinery for many agents in one environment, as a few lines of declaration. In LangGraph you describe the agent as a graph of nodes (perceive, reason, act through tools) and the framework runs the loop, persists state, and handles retries and human-in-the-loop pauses. In Microsoft's AutoGen you declare several conversable agents and let them message one another to solve a task jointly:

# pip install pyautogen
from autogen import AssistantAgent, UserProxyAgent

planner = AssistantAgent("planner", llm_config={"model": "..."})   # one agent
coder   = AssistantAgent("coder",   llm_config={"model": "..."})   # another agent
human   = UserProxyAgent("human", human_input_mode="NEVER")        # acts in the world

# The framework runs each agent's perceive-decide-act loop AND routes their
# messages, so the social-ability property comes for free, not hand-coded.
human.initiate_chat(planner, message="Build and test a gridworld agent.")
Code 29.1.2: The same agent abstraction, now from a framework. The forty lines of Code 29.1.1, multiplied across several cooperating agents, collapse to a handful of declarations; AutoGen supplies each agent's loop and the message routing that gives them social ability.
Practical Example: The Support Bot That Earned the Name Agent

Who: A platform engineer at a SaaS company rebuilding the customer-support automation.

Situation: The existing support bot was a decision tree: it matched the user's message against canned patterns and returned a fixed reply, with no memory and no ability to take action.

Problem: Customers asked compound questions ("why was I charged twice and can you refund one?") that the tree could neither understand nor resolve, so most chats escalated to a human.

Dilemma: Keep extending the tree, cheap and predictable but forever brittle and never autonomous, or rebuild around a real agent that perceives the full conversation, decides on a plan, and acts through tools (look up the account, issue a refund), at the cost of giving software the authority to act.

Decision: They rebuilt it as an agent, because the binding limitation was the absence of autonomy and the inability to act, not a shortage of patterns; no number of new branches would let a decision tree call the refund API on its own judgment.

How: A language-model agent ran the perceive-decide-act loop over each conversation, with billing and account lookups exposed as tools and refunds above a threshold gated behind a human approval that the agent perceived as a percept.

Result: Containment rose sharply because the agent resolved compound requests end to end, and the human approval gate kept it from acting beyond its authority, exactly the agent-environment boundary of Section 2 drawn on purpose.

Lesson: The jump from automation to agent is not more rules; it is autonomy plus the power to act. Name which of the four properties your system lacks, and you will know whether you need a bigger decision tree or a real agent.

6. Why Many Agents, and the LLM Agent Today Intermediate

A single agent, however capable, is bounded by what one perceive-decide-act loop can sense, decide, and do. A multi-agent system is a set of many such agents that share an environment and whose interactions produce behavior no single agent was programmed with. This is the same emergence we have seen mechanically throughout the book, partial gradients combining into one exact gradient in Chapter 1, now lifted to the level of decisions: many local choices combining into a global outcome. We move to many agents for the same three reasons we moved to many machines anywhere in this book. Some problems are inherently distributed (sensors and effectors live in different places); some are too large for one agent's loop to handle in time; and some demand robustness that no single point of decision can provide. The difference is that the units now have goals of their own, so combining their decisions is not a sum but a negotiation, which is why this chapter spends Sections 29.4 through 29.10 on communication, coordination, and consensus.

What has made this classical theory suddenly urgent is that the modern language-model agent is exactly the perceive-decide-act loop, instantiated with a powerful reasoner in the decide step. An LLM agent perceives by reading its context window (the conversation, retrieved documents, tool results), it decides by reasoning over that context to choose a next step, and it acts by emitting a tool call or a message, whose result returns as the next percept. That is Figure 29.1.1 with a language model in the middle box. Because the reasoner is now general enough to handle open-ended goals, the decades-old multi-agent theory of coordination, negotiation, and task allocation has become directly buildable rather than academic. When several of these LLM agents are placed in one environment and made to cooperate, we get a distributed system of reasoners, and orchestrating that system at scale, with shared memory, role assignment, and fault tolerance, is the subject of Chapter 32.

Thesis Thread: Distributing the Last Thing Left, Decisions

This book's spine is the distribution of essential activities across many machines: data, training, the model, inference. Part VI distributes the final one, decision-making itself, and the agent is its unit. Every mechanism in Chapters 29 through 32, message protocols, coordination, negotiation, consensus among agents, is the decision-level analogue of a collective communication primitive from Chapter 4. Where workers all-reduce a gradient to agree on one number, agents negotiate and reach consensus to agree on one decision. Whenever you meet a multi-agent coordination method ahead, ask what its agents are agreeing on and at what cost, exactly the question we asked of every collective in Part I.

Research Frontier: LLM Agents and Agentic AI (2023 to 2026)

The perceive-decide-act agent is the organizing abstraction of the current wave of agentic AI. The ReAct pattern (Yao et al., 2023) interleaves reasoning traces with tool actions, making the decide and act stages explicit in a single language-model loop, and Reflexion (Shinn et al., 2023) adds a memory of past outcomes so the agent improves across episodes. Multi-agent frameworks then multiply the loop: AutoGen (Wu et al., 2023) and CAMEL (Li et al., 2023) put several conversable agents in one environment, and Generative Agents (Park et al., 2023) showed dozens of LLM agents producing believable emergent social behavior in a shared sandbox, a vivid demonstration of social ability at scale. Surveys through 2024 and 2025 (for example Wang et al., 2024) catalogue the explosion of LLM-agent architectures, while frameworks such as LangGraph and CrewAI harden the loop for production. The throughline is that classical multi-agent theory, autonomy, coordination, negotiation, is being rediscovered as the design language for systems of language-model agents, which is why this chapter's definitions are worth getting exactly right.

We now have the unit. An agent is a function from percept histories to actions, running a perceive-decide-act loop, exhibiting autonomy, reactivity, proactiveness, and social ability, choosing actions to maximize its expected utility, and separated from its world by a boundary we draw on purpose. A multi-agent system is many such units sharing one world. Everything ahead is about what those units do to each other across their boundaries. The next section opens the agent's decide box and asks how it should be built, contrasting reactive, deliberative, and hybrid architectures, in Section 29.2.

Exercise 29.1.1: Name the Missing Property Conceptual

For each system, decide which of the four properties of agency (autonomy, reactivity, proactiveness, social ability) it has and which it lacks, and say whether you would call it an agent: (a) a cron job that runs a backup script every night at 2 a.m.; (b) a spreadsheet macro that recomputes when a cell changes; (c) a vacuum robot that explores a room and returns to its dock when low on charge; (d) one trading bot in a market full of other bots reacting to its orders. For any system you judge not to be an agent, state the single property whose absence is decisive, and explain why adding more rules would not supply it.

Exercise 29.1.2: Give the Gridworld Agent a Rival Coding

Extend Code 29.1.1 to two agents in the same grid that both seek the goal, taking turns to move; the first to reach the goal wins, and a cell may hold at most one agent at a time. Each agent perceives the other only when it is in an adjacent cell. Run it and report which agent wins and in how many steps. Then make the agents social: let an agent that perceives the other nearby change its route to avoid a collision. Describe one situation your two agents now handle that the single agent of Code 29.1.1 could not, and connect it to the social-ability property of Section 3.

Exercise 29.1.3: Where Does the Boundary Go? Analysis

Consider a fleet of warehouse robots managed by a central dispatcher that assigns each robot a destination. Model the system two ways: first as a single agent (the dispatcher) whose actuators are the robots, and second as many agents (each robot) sharing an environment with the dispatcher as one more agent. For each model, write down what the percepts and actions are and where the agent-environment boundary lies. Argue from the four properties which model is more faithful when the robots can fail or sense local obstacles the dispatcher cannot see, and relate your answer to the centralized-versus-decentralized choice from Chapter 27.