Part VI: Distributed AI and Multi-Agent Systems
Chapter 32: Distributed Agent Orchestration

Shared State and Distributed Memory

"I read the plan, I did my part, I wrote it back. By the time my pen touched the page, four colleagues had already rewritten the plan. Now we have five plans and one very confused team."

An Agent Stuck Waiting on a Lock
Big Picture

A multi-agent system is only as coherent as the memory its agents share; the moment several agents read and write one common context, you have rebuilt distributed shared state, with all of its consistency, contention, and staleness problems. An individual agent reasons over a finite context window (its working memory) and retrieves facts from an external store on demand (its long-term memory). A team of agents adds a harder ingredient: a shared scratchpad, blackboard, or message history that everyone reads and everyone updates so the team carries common context. That shared object is not a convenience bolted onto the side of the system; it is distributed shared state, and whether it stays consistent is frequently what decides whether a multi-agent system works at all. This section names the kinds of agent memory, shows why the shared kind is the classic distributed-state problem in new clothing, and demonstrates an inconsistency and its fix in runnable code.

An agent that cannot remember is a calculator that happens to speak. Everything interesting an agent does (following a multi-step plan, recalling what a tool returned three turns ago, knowing a user's name) rests on memory. For a single agent the question is where that memory lives and how it is brought back into view at the moment of reasoning. For a team of agents a second question appears, the one this section is about: how do several agents hold a common picture of the world so that their actions add up to coordinated behavior instead of contradiction? The answer turns out to be the same shared-state machinery the rest of this book has been building, applied to the context that drives a language model.

We have met this shape before. The modern agent blackboard is the lineal descendant of the blackboard architecture introduced for distributed problem solving in Section 27.4: a shared workspace where independent knowledge sources post partial results and read each other's contributions. What is new is that the knowledge sources are now language-model agents, the postings are natural-language context, and the cost of getting consistency wrong is not a stale cache entry but an incoherent team that argues with itself.

Shared blackboard memory: many agents, one common context Planner working ctx Researcher working ctx Writer working ctx Blackboard shared scratchpad + message history consistency guard (lock) read / write Long-term memory episodic + semantic, external vector store retrieve persist divergent reads here => incoherent team; the guard serializes writes so all agents converge
Figure 32.7.1: The agent-team memory picture. Each agent reasons over its own finite working context (left) but reads and writes a central blackboard that holds the team's shared context (center). The blackboard is backed by an external long-term memory retrieved on demand via vector search (right, the retrieval machinery of Chapter 25). The consistency guard on the write path is the synchronization this section is about: without it, agents that read divergent views of the blackboard act incoherently.

1. The Four Kinds of Agent Memory Beginner

Before the multi-agent twist, fix the vocabulary for a single agent. Practitioners borrow the cognitive-science taxonomy, and it maps cleanly onto distinct storage mechanisms. Table 32.7.1 lists the four kinds, what each holds, and where it physically lives, because the storage location is what determines the distributed-systems problem each one raises.

Table 32.7.1: The four memory types of an agent, the substrate each lives on, and the binding constraint each imposes.
Memory typeWhat it holdsWhere it livesBinding constraint
Working (short-term)The current task, recent turns, tool outputs in viewThe model's context windowFinite tokens; the hard ceiling
Long-termEverything too large to keep in contextExternal store, retrieved on demandRetrieval quality and latency
EpisodicPast interactions and trajectoriesLogged transcripts, vector-indexedRecall of the right past episode
SemanticFacts and durable knowledgeKnowledge base or vector storeFreshness and correctness

Working memory is the context window, and it is the binding constraint on everything an agent does. It is finite: a model can attend to only so many tokens, and every token in view is paid for at every call. Long-term memory is the escape hatch. Anything that does not fit in the window lives in an external store and is pulled back in on demand by vector search, which is exactly the retrieval-augmented generation of Chapter 25 applied to an agent's own history rather than a document corpus. Episodic memory is the special case of long-term memory that stores past trajectories ("the last time I tried this tool it failed this way"); semantic memory stores durable facts ("the user's deployment region is eu-west"). The two differ in what they index and how they are written, but both are retrieved the same way: embed the current situation, search for the nearest stored items, and splice the top matches back into the finite window.

Key Insight: Working Memory Is the Ceiling; Everything Else Is Retrieval Around It

An agent's intelligence at any instant is bounded by what fits in its context window. Every other memory type exists to manage that scarcity: long-term, episodic, and semantic memory are all just disciplined ways of deciding which absent facts to retrieve into the window now and which to leave out. Design an agent's memory by asking, at each step, "what is the minimum context that makes the next decision correct?" The rest belongs in an external store, fetched only when its embedding says it is relevant.

2. The Multi-Agent Twist: Memory Becomes Shared State Intermediate

Give one agent the four memories above and you have a capable assistant. Now put several agents on one task and ask them to coordinate. They need a common context: the planner's decomposition, the researcher's findings, the writer's draft, all visible to the team so each agent acts on the same picture. The standard way to provide it is a shared object that every agent reads and writes, called a scratchpad, a blackboard, or simply the shared message history. The instant that object exists, it is distributed shared state, and it inherits the three problems every distributed shared state has had since Section 2.5 made consistency precise.

The first problem is consistency. If two agents hold divergent copies of the shared context, they reason from different premises and their actions do not compose; the team behaves incoherently even though each agent is individually correct. This is the distributed-belief inconsistency of Section 27.8 reappearing at the implementation layer: divergent state is divergent belief. The second problem is contention. When several agents write the shared object concurrently, their updates race, and a naive read-modify-write loses writes (the lost-update anomaly we are about to reproduce). The third is staleness. An agent that read the blackboard a moment ago may act on a view that newer writes have already invalidated, the same bounded-staleness tension that parameter servers manage when workers pull slightly old parameters.

This is not an analogy that merely rhymes; it is the same pattern with the same fixes. The shared agent blackboard is the parameter-server-style shared-state pattern of Section 11.9, where many workers read and update one logically central store, transposed from gradient vectors to natural-language context. Keeping that store consistent costs synchronization, and synchronization costs time; the engineering question is how much consistency the team actually needs, the same question the consistency spectrum of Section 2.5 framed for storage systems.

Thesis Thread: The Blackboard Returns, Now Holding Context

The shared-state primitive this book keeps circling back to surfaces here in its agentic form. It began as the blackboard for distributed problem solving (Section 27.4), hardened into the parameter server where many workers read and write one sharded store (Section 11.9), and now reappears as the shared memory of an agent team. The substrate changed from facts to gradients to language-model context, but the question never did: how do many writers keep one logical state coherent without paying more for synchronization than the coordination is worth? Whenever you design an agent team's memory, you are choosing a point on the same consistency-versus-cost curve the rest of the book has been walking.

3. An Inconsistency, and the Synchronization That Fixes It Intermediate

The cleanest way to see the shared-state problem is to build the smallest version of it. The code below implements a Blackboard, a shared key/value memory that a team of stub agents read and write. Each agent records its task completions by incrementing a shared counter. Done correctly, fifty agents recording twenty completions each must leave the counter at exactly one thousand; that is the coherent state the whole team should agree on. The unsynchronized path has each agent read the counter, do a little work, then write back the incremented value, the classic read-modify-write that loses updates the moment two agents interleave. The synchronized path wraps the same read-modify-write in a lock so it executes as one atomic critical section.

import threading
import time

class Blackboard:
    """A shared key/value memory several agents read and write."""
    def __init__(self):
        self._store = {}
        self._lock = threading.Lock()

    def read(self, key, default=None):
        return self._store.get(key, default)

    def write(self, key, value):
        time.sleep(0.001)            # work between an agent's read and its write
        self._store[key] = value

    def update_locked(self, key, fn, default=None):
        with self._lock:             # read-modify-write as ONE critical section
            cur = self._store.get(key, default)
            time.sleep(0.001)
            self._store[key] = fn(cur)

def run_team(blackboard, locked, n_agents=50, increments=20):
    blackboard.write("done", 0)
    def agent():
        for _ in range(increments):
            if locked:
                blackboard.update_locked("done", lambda c: c + 1, default=0)
            else:
                cur = blackboard.read("done", 0)   # stale read
                blackboard.write("done", cur + 1)  # lost update under contention
    threads = [threading.Thread(target=agent) for _ in range(n_agents)]
    for t in threads: t.start()
    for t in threads: t.join()
    return blackboard.read("done"), n_agents * increments

for locked in (False, True):
    seen, expected = run_team(Blackboard(), locked=locked)
    tag = "WITH a lock" if locked else "WITHOUT synchronization"
    print(f"{tag:<24} done={seen:<5} expected={expected}  lost={expected - seen}")
Code 32.7.1: A shared blackboard memory written by a team of stub agents. The unsynchronized branch reads a possibly-stale value before writing; the update_locked branch performs the read-modify-write inside a lock so concurrent agents serialize.
WITHOUT synchronization  done=24    expected=1000  lost=976
WITH a lock              done=1000  expected=1000  lost=0
Output 32.7.1: Without synchronization, 976 of 1000 updates are lost: agents read a stale counter and overwrite each other, and the team's shared state is wildly wrong. Under the lock, every update lands and the blackboard converges to the coherent total of 1000.

The unsynchronized run is not merely a little off; it lost 976 of 1000 updates, because while one agent held a stale value and slept, dozens of others read the same stale value and clobbered each other's writes. Translate the counter back into agent terms: it stands for any shared fact the team accumulates (tasks claimed, sub-results posted, a running plan). Lose updates to it and agents proceed on a picture of the world that silently disagrees with reality and with each other, which is exactly the incoherent-team failure mode. The lock costs something real: agents now wait their turn, and under heavy contention that wait is the bottleneck. That cost is the synchronization tax, and the whole craft of shared agent memory is paying as little of it as coherence allows, by locking at the finest granularity, by sharding the blackboard so unrelated keys never contend, or by accepting bounded staleness where the task tolerates it.

Practical Example: The Research Crew That Wrote Five Different Reports

Who: A platform team shipping a multi-agent research assistant that fans a query out to several worker agents and composes one report.

Situation: A planner agent decomposed a query into sub-questions and posted them to a shared scratchpad; worker agents claimed sub-questions, researched them, and wrote findings back to the same scratchpad.

Problem: Final reports were duplicating sections and dropping others, and the defect was intermittent, appearing only under load when many workers ran at once.

Dilemma: Serialize every scratchpad access for guaranteed coherence at the cost of throughput, or keep the fast concurrent path and hope the races stayed rare enough to ignore.

Decision: They serialized only the claim-and-write of a sub-question (a per-item critical section), leaving reads of unrelated items concurrent, the fine-grained-locking middle path.

How: Each sub-question became its own key guarded by its own lock; a worker atomically checked "unclaimed?" and set "claimed-by-me" before researching, so two workers could never claim the same item.

Result: Duplicate and missing sections disappeared, and because locks were per-item rather than global, throughput barely moved; the shared state stayed coherent under load.

Lesson: The lost-update anomaly of Output 32.7.1 is not a textbook curiosity; it is the default behavior of any shared agent scratchpad written concurrently. Lock at the granularity of the contended item, not the whole board.

4. Context Management: The Cost of Carrying the Shared Picture Intermediate

Consistency is one tax on shared memory; the size of the shared context is the other. Working memory is finite, so the running blackboard cannot simply grow without bound: at some point the accumulated message history no longer fits the window. The standard remedy is summarization, also called compaction, where older context is condensed into a shorter summary that preserves the load-bearing facts and discards the rest. A team might keep the last few turns verbatim and replace everything older with a running summary, trading fidelity for room.

The economics are sharper in the multi-agent case because of a multiplier. Suppose the shared context holds $T$ tokens and the team has $A$ agents, each of which must see that context to act. If every agent reads the full shared context on every one of its calls, and the team makes $C$ calls in total, the tokens paid for scale as

$$\text{cost} \;\propto\; \sum_{c=1}^{C} T_c,$$

where $T_c$ is the context size at call $c$. The trap is that $T_c$ tends to grow over a session as the blackboard accumulates, and it is paid at every agent's every call. A shared context that grows linearly with the conversation, read by $A$ agents over a session, turns a single agent's linear token bill into a roughly $A$-times-larger one, and a long unmanaged history makes each $T_c$ larger still. This is why compaction and selective context (giving each agent only the slice of the blackboard it needs, not the whole board) are not optimizations to add later; they are what keep a multi-agent system affordable and fast, since latency tracks tokens too.

Library Shortcut: LangGraph Checkpointers and Letta Manage the Memory for You

In Code 32.7.1 we built the shared store, the lock, and the read-modify-write loop by hand. Production agent frameworks expose shared state and long-term memory as first-class objects so you do not reimplement the concurrency. LangGraph threads a typed shared state through the graph and persists it with a checkpointer, giving every node a consistent read and a serialized write; Letta (the framework that grew out of the MemGPT research below) manages the working-versus-long-term split automatically, paging facts in and out of the context window.

# LangGraph: shared state + a checkpointer give consistent, persisted memory
from langgraph.graph import StateGraph, MessagesState
from langgraph.checkpoint.memory import InMemorySaver

builder = StateGraph(MessagesState)          # MessagesState IS the shared blackboard
# ... add_node(planner), add_node(researcher), add_edge(...) ...
graph = builder.compile(checkpointer=InMemorySaver())  # persists + serializes state

# Letta: agents with self-managed long-term memory (pip install letta)
from letta_client import Letta
client = Letta(base_url="http://localhost:8283")
agent = client.agents.create(model="openai/gpt-4o-mini",
                             memory_blocks=[{"label": "shared", "value": "team scratchpad"}])
Code 32.7.2: The hand-rolled blackboard, lock, and memory bookkeeping of Code 32.7.1 collapse to a typed shared state plus a checkpointer in LangGraph, or to self-managed memory blocks in Letta. The framework owns the consistent read, the serialized write, and the page-in/page-out of long-term memory.
Fun Note: The Goldfish Standup

An agent with no long-term memory and a small window is a coworker with the recall of a goldfish: brilliant for ninety seconds, then convinced the meeting just started. Multi-agent memory frameworks exist so the standup does not reset every time someone speaks. The blackboard is the shared notebook everyone scribbles in precisely because nobody trusts their own goldfish to remember what the team already decided.

5. Why Memory Often Makes or Breaks the System Advanced

It is tempting to treat memory as plumbing and spend design effort on the agents' reasoning prompts. In practice the opposite allocation usually pays off, because the failure modes that sink multi-agent systems are overwhelmingly memory and state failures rather than reasoning failures. An agent that reasons perfectly over a stale or inconsistent context produces a confidently wrong action; a team whose shared state diverges produces members that work hard at cross purposes. The lost updates in Output 32.7.1 and the divergent reports in the practical example were not reasoning bugs, they were shared-state bugs, and no amount of prompt engineering fixes a lost write.

The three taxes compose into a single design tension. More agents and more shared context buy more capability, but they raise the consistency cost (more writers contending), the staleness risk (more readers acting on old views), and the token cost (more context read more times). A system that ignores any one of them degrades in a characteristic way: ignore consistency and the team is incoherent, ignore staleness and it is subtly out of date, ignore token cost and it is slow and expensive. Getting the shared memory right, choosing the consistency level, the locking granularity, and the compaction policy that the task actually needs, is therefore not a finishing touch on a multi-agent system; it is frequently the thing that determines whether the system works.

Research Frontier: Agent Memory Systems (2024 to 2026)

Because working memory is the binding constraint, a fast-moving research line treats the context window as a managed resource rather than a fixed budget. MemGPT (Packer et al., 2023, with rapid 2024 to 2026 development) frames the language model as an operating system that pages information between a small in-context working memory and a large external store, deciding for itself what to keep resident and what to evict, and ships in production as the Letta framework. Parallel efforts build explicit long-term memory layers for agents: systems in the lineage of MemGPT, A-MEM, and Mem0-style memory services add episodic and semantic stores with learned write and retrieval policies, while work on context compaction studies which summaries preserve task-relevant information under aggressive token budgets. For multi-agent teams specifically, recent work asks how to keep a shared memory consistent without a global lock, borrowing conflict-free replicated data types and bounded-staleness ideas from distributed databases. We carry the retrieval half of this story from Chapter 25; the open question the field is converging on is how a team of agents should share, version, and reconcile memory at scale.

With the shared-memory substrate in place, the remaining question is operational: what engine actually hosts these agents, routes their messages, and runs the blackboard reliably across machines? That is the orchestration layer, and the next section turns to the distributed orchestration engines that provide it, in Section 32.8.

Exercise 32.7.1: Classify the Memory Conceptual

For a customer-support agent team, assign each item to one of the four memory types in Table 32.7.1 and justify the choice: (a) the three most recent messages in the current ticket; (b) the company's refund policy document; (c) the transcript of how a similar ticket was resolved last month; (d) the running list of sub-tasks the agents have claimed on this ticket. For item (d), explain why it is the one that raises a distributed-shared-state problem the other three do not.

Exercise 32.7.2: Break, Then Fix, the Blackboard Coding

Extend Code 32.7.1 so the blackboard holds a shared list of "claimed tasks" rather than a counter, and have each of $A$ agents claim distinct tasks concurrently. First use an unsynchronized check-then-claim (read the list, if the task is absent append it) and show that under contention two agents claim the same task. Then fix it two ways: a single global lock around the whole claim, and a finer-grained scheme with one lock per task key. Measure wall-clock time for both fixes as you raise $A$, and explain the throughput difference in terms of the contention tax discussed in Section 3.

Exercise 32.7.3: The Token Bill of a Shared Context Analysis

A team of $A = 4$ agents shares a blackboard whose context grows by 300 tokens per agent call and starts at 500 tokens. Every agent reads the full shared context on every call, and the session runs for $C = 40$ total calls split evenly across agents. Using the cost expression in Section 4, estimate the total input tokens paid across the session. Then estimate the saving from a compaction policy that caps the shared context at 2000 tokens by summarizing older content, and from selective context that gives each agent only half the board on average. State which lever helps more here and why the answer depends on $C$.