Section 32.3: Planner-Executor and Role-Specialized Agents

"My planner handed me one clean subtask and called it delegation. The other three executors got the interesting parts. I have notes."
An Executor Routed Below Its Skill Level

Big Picture

The hard way to get a large language model to solve a complex task is to ask it for the whole answer in one breath; the reliable way is to split the task into smaller pieces, give each piece to an agent suited to it, and recombine the results. That split is not a new idea dressed in new clothes. It is exactly the distributed problem solving of classical distributed AI (Chapter 27): a coordinator decomposes a problem, specialists work the parts, and an assembly step produces the whole. This section gives that pattern its modern form for LLM agents. A planner agent decomposes and monitors; role-specialized executor agents carry out focused subtasks; a synthesizer assembles the result. We will see why decomposition makes each step easier and parallelizable (Section 32.4), what it costs in coordination overhead and single-point-of-failure risk, and the uncomfortable fact that a team of agents is frequently worse than one good agent.

In Section 32.1 we treated a single LLM agent as a distributed component, a perceive-reason-act loop you can place on a server and call over the network. In Section 32.2 we gave that component hands, through tool use and function calling, so it could act on the world beyond text. One capable agent with tools can already do a great deal. The question this section answers is what to do when the task is too large, too varied, or too error-prone for one agent to handle in a single reasoning chain: how to divide the work among several agents so the team is more reliable than any member alone.

The answer the field has converged on is structure, not raw model power. You impose a division of labor. One agent owns the plan and the assembly; other agents own narrow, well-specified pieces. This is the planner-executor pattern, and it is the LLM-era instance of a problem that distributed AI has studied for decades. Reading this section, keep one comparison in mind: everything here is the centralized-coordinator architecture of Chapter 27, with an LLM in the coordinator's chair and LLM agents as the problem solvers.

1. Decomposition Is the Old Idea, Now Run by an LLM Beginner

Distributed problem solving has always had the same three beats: decompose a task into subtasks, distribute the subtasks to solvers, and synthesize the partial solutions into a final answer. Classical distributed AI built these systems by hand, with task graphs and contract nets and explicit interfaces between solvers (Chapter 27). What is new is that a single general-purpose model can now perform any of the three beats from a natural-language description, so the decomposition itself can be generated rather than hard-coded.

Concretely, a planner agent receives the task and emits a plan: a list of subtasks, each with an instruction, an assigned solver, and (often) a note about which earlier subtasks it depends on. Executors carry out the subtasks. A synthesizer, frequently the planner again, folds the partial results into the deliverable. The reason this helps is not mysterious. A model asked to write a forty-page audit in one pass will lose the thread, repeat itself, and hallucinate structure; the same model asked to write one section, given the others as context, stays focused and reliable. Smaller, narrower subtasks fit the model's effective working context, are individually checkable, and, when independent, can run at the same time, which is the subject of Section 32.4.

Key Insight: A Subtask Is a Smaller Prompt, and Smaller Prompts Are More Reliable

The benefit of decomposition is not that several models are smarter than one; it is the same model running on inputs it handles well. An LLM's reliability falls off as a task grows in length, ambiguity, and the number of distinct goals it must juggle at once. Decomposition trades one hard prompt for several easy ones, plus the cost of a plan and a synthesis step. The trade pays off precisely when the subtasks are genuinely easier than the whole and the coordination overhead is smaller than the reliability you buy back.

2. The Planner-Executor Pattern Beginner

In the planner-executor pattern the planner is a centralized coordinator. It alone holds the global view of the task; the executors see only their slice. After decomposing, the planner does not walk away. It monitors progress, checks whether each returned result is acceptable, and replans when a step fails or the world turns out differently than the plan assumed. This monitor-and-replan loop is what separates a planner from a one-shot task splitter: a good planner treats its first decomposition as a hypothesis to be revised, not a contract to be executed blindly.

We can state the dependency structure precisely. A plan is a directed acyclic graph $G = (V, E)$ whose vertices $V$ are subtasks and whose edges $E$ encode "must finish before". A subtask becomes runnable once all its predecessors have completed, so the executors can process the graph in any order consistent with a topological sort, and any two subtasks with no path between them may run concurrently. The planner's job is to produce a $G$ whose subtasks are each easy, whose edges are few (more edges mean less parallelism and more waiting), and whose leaves recombine cleanly into the answer. Figure 32.3.1 shows the flow for one such graph and, beside it, the hierarchical variant we reach in Section 4.

Figure 32.3.1: Two faces of the same idea. On the left, a single planner decomposes a task into subtasks routed to role-specialized executors (researcher, coder, critic), whose outputs the synthesizer assembles; subtasks with no edge between them may run in parallel. On the right, the hierarchical variant of Section 4: a manager delegates to sub-team leads, each of which commands its own worker agents, pushing decomposition down through the tree.

3. Role Specialization: Division of Labor by Expertise Intermediate

Decomposition splits the work; role specialization decides who does which piece. Instead of calling the same generic agent for every subtask, you give each agent a distinct role: a researcher with retrieval tools and a prompt that rewards thoroughness, a coder with a sandbox and a prompt that rewards correct, runnable output, a critic whose only job is to find flaws, a manager who delegates and integrates. Each role gets a tailored system prompt, a tailored tool set, and sometimes a different underlying model. The team then divides labor by expertise, and a subtask is routed to the role best equipped for it.

This is the same division of labor that classical multi-agent systems studied, now expressed through prompts and tool grants rather than hand-built modules (Chapter 29). It is worth noting that the centralized router here achieves by assignment what swarm systems achieve without any router at all: in response-threshold models of swarm intelligence (Section 31.8), each agent picks up whatever task its individual threshold makes it most responsive to, and specialization emerges from the collective with no coordinator deciding. The planner-executor pattern buys the same specialization with explicit control and pays for it with a coordinator that can become a bottleneck. The critic role earns its own treatment in Section 32.5, where debate and reflection turn one agent's output into another agent's input.

Fun Note: The Critic Who Approves Everything Is Not a Critic

A recurring failure of role-specialized teams is the agreeable critic. Give an agent the role "critic" with a polite system prompt, and it will congratulate the coder on excellent work and approve a function that does not compile. The fix is to make the role adversarial in the prompt and, better, to give the critic a tool that actually runs the code, so its approval has to survive contact with an interpreter. A role is only as specialized as its prompt and tools make it; a label alone changes nothing.

The demo below makes decompose-delegate-synthesize concrete with a stub planner and four role-specialized executor stubs. There is no network and no real model; the planner pattern-matches the task into a fixed plan, the router sends each subtask to the matching executor, and dependent subtasks receive the outputs of their predecessors as context. The point is the control flow, the same flow a real planner-executor system runs with live LLM calls in each box.

from dataclasses import dataclass, field


@dataclass
class Subtask:
    """One unit of decomposed work: which role should do it, and on what input."""
    id: int
    role: str                       # which specialist handles it
    instruction: str                # the focused, narrow instruction
    depends_on: list = field(default_factory=list)   # ids that must finish first


def planner(task):
    """Stub PLANNER: the centralized coordinator of Chapter 27. A real planner is
    an LLM that emits a plan as JSON; here we pattern-match the task and return a
    fixed decomposition with an explicit dependency (synthesis waits on research)."""
    if "report" in task.lower():
        return [
            Subtask(1, "researcher", "Gather three facts about distributed agents."),
            Subtask(2, "coder", "Write a one-line pseudo-API for delegating a subtask."),
            Subtask(3, "critic", "Name the main risk of a single planner.", depends_on=[1]),
            Subtask(4, "writer", "Assemble facts, API, and risk into a report.",
                    depends_on=[1, 2, 3]),
        ]
    return [Subtask(1, "writer", f"Answer directly: {task}")]


# Each executor is a NARROW specialist with its own canned behavior. In a real
# system each is a separate agent with a tailored prompt and its own tools.
def researcher(instr, ctx):
    return "facts=[planners decompose, executors specialize, decomposition is exact]"

def coder(instr, ctx):
    return "api: delegate(role, instruction) -> result"

def critic(instr, ctx):
    return "risk: the planner is a single point of failure and a bottleneck"

def writer(instr, ctx):
    if not ctx:                                   # the single-agent fallback path
        return f"[direct] {instr}"
    parts = " | ".join(f"{k}:{v}" for k, v in sorted(ctx.items()))
    return f"REPORT << {parts} >>"


EXECUTORS = {"researcher": researcher, "coder": coder,
             "critic": critic, "writer": writer}


def orchestrate(task):
    """DECOMPOSE -> DELEGATE -> SYNTHESIZE. The planner monitors progress and
    routes each ready subtask to its role-specialized executor, passing the
    outputs of finished dependencies as that executor's context."""
    plan = planner(task)
    print(f"planner: decomposed into {len(plan)} subtask(s)")
    results = {}
    for st in plan:
        ctx = {results[d][0]: results[d][1] for d in st.depends_on}   # gather deps
        out = EXECUTORS[st.role](st.instruction, ctx)                 # DELEGATE
        results[st.id] = (st.role, out)
        dep = f" (used {st.depends_on})" if st.depends_on else ""
        print(f"  subtask {st.id} -> {st.role:<10}{dep}: {out}")
    final = results[plan[-1].id][1]                                   # SYNTHESIZE
    print(f"synthesizer: {final}")
    return final


if __name__ == "__main__":
    print("=== multi-agent task (decomposed) ===")
    orchestrate("Produce a short report on planner-executor agents.")
    print("\n=== simple task (single agent wins) ===")
    orchestrate("What is 2 plus 2?")

Code 32.3.1: A planner-executor system in pure Python. The planner decomposes a task into a list of typed Subtasks, orchestrate delegates each to its role-specialized executor and threads dependency outputs through as context, and the last subtask's result is the synthesized deliverable. Swap each stub for a real LLM call and the control flow is unchanged.

=== multi-agent task (decomposed) ===
planner: decomposed into 4 subtask(s)
  subtask 1 -> researcher: facts=[planners decompose, executors specialize, decomposition is exact]
  subtask 2 -> coder     : api: delegate(role, instruction) -> result
  subtask 3 -> critic     (used [1]): risk: the planner is a single point of failure and a bottleneck
  subtask 4 -> writer     (used [1, 2, 3]): REPORT << coder:api: delegate(role, instruction) -> result | critic:risk: the planner is a single point of failure and a bottleneck | researcher:facts=[planners decompose, executors specialize, decomposition is exact] >>
synthesizer: REPORT << coder:api: delegate(role, instruction) -> result | critic:risk: the planner is a single point of failure and a bottleneck | researcher:facts=[planners decompose, executors specialize, decomposition is exact] >>

=== simple task (single agent wins) ===
planner: decomposed into 1 subtask(s)
  subtask 1 -> writer    : [direct] Answer directly: What is 2 plus 2?
synthesizer: [direct] Answer directly: What is 2 plus 2?

Output 32.3.1: The report task decomposes into four subtasks; the critic consumes the researcher's facts (its dependency) and the writer assembles all three predecessors, exactly the decompose-delegate-synthesize flow. The trivial arithmetic task collapses to a single subtask, the planner's own way of recognizing that one agent suffices, which is the lesson of Section 5.

Thesis Thread: Decompose and Synthesize, Returning Once More

The decompose-distribute-synthesize shape in Output 32.3.1 is the same shape this book has used since its first pages. It is MapReduce, where a map splits the data and a reduce assembles the partials (Chapter 6). It is data-parallel training, where workers compute partial gradients and all-reduce combines them (Chapter 15). Here the unit of work is a natural-language subtask and the combiner is an LLM rather than a sum, but the architecture is identical: a coordinator partitions, specialists compute in parallel, and an assembly step makes them one result. The planner-executor pattern is distributed problem solving wearing the book's oldest pattern.

4. Hierarchical Agent Teams Intermediate

A single planner with a flat row of executors works until the task is large enough that the plan itself becomes the bottleneck: too many subtasks for one coordinator to hold in context, too many distinct kinds of expertise to manage from one prompt. The remedy is the hierarchical, or hybrid, architecture of Chapter 27. A top-level manager agent decomposes the task into a few large chunks and delegates each to a sub-team lead, which is itself a planner for its chunk, decomposing further and delegating to its own workers. Decomposition recurses down the tree, as the right side of Figure 32.3.1 shows, and results flow back up, each level synthesizing its children's outputs before reporting to its parent.

Hierarchy buys two things. It bounds the context any one agent must hold, since each node reasons only about its own subtree, and it localizes failure and replanning, since a sub-team can retry internally without the manager replanning the whole task. It costs depth: every level adds a round of decomposition and synthesis, so a deep tree pays more coordination latency before any real work happens. The design question is how to balance the branching factor and depth so that subtasks reach a size an executor handles well without burying the actual work under layers of managers who only delegate.

5. When One Agent Beats a Team Advanced

The most important thing to say about multi-agent teams is that they are not free and frequently not worth it. Every edge in the plan graph is a handoff, and every handoff is a chance to drop context, misroute a subtask, or propagate an error. Decomposition has real costs, and a clear-eyed practitioner accounts for all of them before reaching for a team. The chief costs are these.

Coordination overhead. The planner and synthesizer are extra LLM calls that do no domain work; on a small task they can cost more tokens and latency than the task itself.
The planner as bottleneck and single point of failure. Every subtask flows through one coordinator. If the plan is wrong, the whole run is wrong, and no executor can fix a flawed decomposition from inside its narrow slice.
Error propagation. A mistake in an early subtask flows downstream into every step that depends on it. If each of $n$ sequential subtasks succeeds independently with probability $p$, the chain's success probability is $p^{n}$, which decays fast: at $p = 0.9$ and $n = 5$ the chain succeeds only about $59\%$ of the time. More steps can mean less reliability, not more.

So a single capable agent often wins. When the task fits comfortably in one model's context, has a single kind of expertise, and needs no parallelism, the orchestration overhead is pure loss: you pay for a planner and a synthesizer to wrap a job one agent could have done directly. The output of Code 32.3.1 makes this visible, the arithmetic task collapses to one subtask, because the right number of agents for a simple task is one. Multi-agent is a tool for tasks whose decomposition genuinely buys reliability or speed that exceeds its coordination tax, and measuring whether that tax is paid back is the subject of Section 32.9.

Practical Example: The Five-Agent Pipeline That Lost to One Prompt

Who: An applied-AI team building an automated code-migration assistant for a fintech platform.

Situation: They built a role-specialized pipeline (planner, researcher, two coders, critic) to port small services from one framework to another.

Problem: On a held-out set of real migrations the pipeline succeeded on $58\%$ of tasks, while a single well-prompted agent with the same tools hit $74\%$, at a fifth of the token cost.

Dilemma: Keep the elegant multi-agent design that everyone had invested in, or admit that for these task sizes one agent was simply better and cheaper.

Decision: They measured per-stage error rates and found the chain was the problem: each handoff dropped a little context, and a five-step chain at roughly $0.9$ per-step reliability cannot beat a single confident agent.

How: They collapsed the team to one agent for small migrations and reserved the planner-executor pipeline for large multi-service migrations, where genuine parallel subtasks and distinct expertise made the coordination tax pay off.

Result: Overall success rose to $79\%$ and cost fell, because the team was now used only where decomposition actually bought something.

Lesson: Multi-agent is not a default; it is an optimization that must clear the bar set by a single good agent. Measure the baseline before you build the team.

Research Frontier: Planning Agents and Role-Specialized Teams (2024 to 2026)

Three threads define the current frontier. The first is software-development teams of role-specialized agents: MetaGPT (Hong et al., 2024) encodes standardized operating procedures so a product-manager, architect, engineer, and tester agent collaborate through structured documents, and ChatDev (Qian et al., 2024) runs a chat-driven "virtual software company" of conversing role agents. The second is explicit planning: plan-and-execute architectures (in the lineage of ReWOO and LLMCompiler) separate a planning pass from execution so the plan can be checked, parallelized, and replanned, and AutoGen (Wu et al., 2024) provides a general conversable-agent framework for assembling such teams. The third is the sober counter-current, careful evaluations reporting that multi-agent systems often fail to beat a strong single agent and analyzing why (the MAST taxonomy of multi-agent failure modes, Cemri et al., 2025), pushing the field from "more agents" toward "the right structure, measured". We take up evaluation directly in Section 32.9 and the debate-and-critique thread in Section 32.5.

Library Shortcut: CrewAI and AutoGen Provide Roles, Planners, and Delegation

Code 32.3.1 hand-built the plan, the router, and the dependency threading. Production agent frameworks give you those primitives directly. CrewAI models a "crew" of role-specialized agents and a task list with dependencies; AutoGen models conversable agents with a manager that delegates and a group-chat that routes. The roughly seventy lines of orchestration above collapse to a declarative team definition:

# pip install crewai
from crewai import Agent, Task, Crew, Process

researcher = Agent(role="Researcher", goal="Gather facts",
                   backstory="A thorough analyst.")      # tailored prompt + tools
writer = Agent(role="Writer", goal="Assemble a report",
               backstory="A precise technical writer.")

t1 = Task(description="Gather three facts about distributed agents.", agent=researcher)
t2 = Task(description="Assemble the facts into a report.", agent=writer, context=[t1])

crew = Crew(agents=[researcher, writer], tasks=[t1, t2],
            process=Process.hierarchical)               # a manager plans + delegates
result = crew.kickoff()                                  # decompose -> delegate -> synthesize

Code 32.3.2: The same decompose-delegate-synthesize flow as Output 32.3.1, now declarative. The framework owns the planner, the routing, the dependency context (context=[t1]), the retries, and the LLM calls; you declare roles and the task graph. The hierarchical process adds the manager of Section 4 with one keyword.

6. Putting the Pattern to Work Intermediate

We now have the modern instance of distributed problem solving for LLM agents: a planner that decomposes a task into a dependency graph and monitors it, role-specialized executors that handle focused subtasks by expertise, and a synthesizer that assembles the result, with hierarchy available when one coordinator is not enough. We also have the discipline that keeps the pattern honest, that a team must beat a single good agent before it earns its coordination tax, and that error propagation through a long chain can make more agents less reliable. The decomposition we drew here was a static graph executed top to bottom; it was deliberately sequential so the control flow stayed visible. The next section removes that restriction and runs the independent subtasks at the same time. Parallel and distributed multi-agent workflows, where the executors of Figure 32.3.1 run concurrently across machines, begin in Section 32.4.

Exercise 32.3.1: Map the Pattern to Its Ancestor Conceptual

The planner-executor pattern is the centralized-coordinator architecture of Chapter 27 with an LLM in the coordinator's seat. For each of the three classical concerns of that architecture, the coordinator as bottleneck, the coordinator as single point of failure, and the cost of decomposing and assembling, state how it reappears in an LLM planner-executor system and one concrete symptom you would observe in production. Then explain why the response-threshold swarms of Section 31.8 avoid the first two concerns entirely, and what they give up to do so.

Exercise 32.3.2: Add Hierarchy and Replanning Coding

Extend Code 32.3.1 in two ways. First, make one executor return a sentinel failure value (for example, the string "FAILED") for a particular instruction, and have orchestrate detect it and replan by re-routing that subtask to a different role, printing the recovery. Second, turn the flat plan into a two-level hierarchy: add a manager that emits two large chunks, each handed to a sub-planner that produces its own subtasks, and synthesize the chunk results at the top. Report how many LLM-equivalent calls the hierarchical version makes versus the flat one, and argue from that count when the hierarchy is worth its extra coordination rounds.

Exercise 32.3.3: When Does the Chain Break Even? Analysis

Model a decomposed task as a chain of $n$ sequential subtasks, each succeeding independently with probability $p$, so the chain succeeds with probability $p^{n}$. A single agent attempting the whole task succeeds with probability $q$. (a) For $p = 0.9$, find the largest $n$ for which the chain still beats a single agent with $q = 0.6$. (b) Decomposition usually raises the per-step reliability above the single-agent-on-the-whole reliability, because each subtask is easier; if decomposing lifts each step to $p = 0.97$, redo part (a). (c) Explain in two sentences why this calculation argues for fewer, larger, more reliable subtasks rather than many tiny ones, and connect it to the error-propagation cost in Section 5.