Section 29.4: Communication

"I sent a perfectly clear request. The other agent inferred a perfectly clear, completely different intent. We have been negotiating the meaning of the word 'now' for three rounds."
An Agent Stuck Waiting on a Lock

Big Picture

Communication is the substrate of every multi-agent system: agents that cannot exchange information cannot coordinate, and so before negotiation, coalition formation, or consensus can happen, the agents must agree on what a message is, what it means, and when it is allowed to be sent. The previous sections built single agents and the environments they share; this section is about the wire between them. We treat a message not as a string but as a speech act that carries intent (request, propose, accept), we wrap those acts in interaction protocols that keep a conversation coherent, and we confront the hard problem that two agents only transfer knowledge if they share the meaning of the symbols they send. We also count the cost: every message is bytes on a network, the same scale-out tax that governs every other chapter of this book, so well-designed agents communicate selectively. Finally we look at how language-model agents dissolve the old vocabulary problem by talking in natural language, and what new problem that creates.

In Section 29.3 we placed agents in a shared environment and saw that each one perceives only a slice of the whole. Communication is how an agent borrows another agent's slice: it is the mechanism by which private state becomes shared state. A multi-agent system is, at bottom, a distributed system whose nodes happen to be reasoning agents, and like any distributed system its behavior is determined less by what each node computes than by what the nodes tell each other and when. This section develops communication from the ground up, starting from the question of what should even count as a message.

Figure 29.4.1: Two ways agents exchange information. On the left, direct communication: agents send typed performative messages (request, propose, accept, inform) to named recipients, all wrapped in a protocol envelope that fixes the legal order of messages. On the right, indirect communication (stigmergy): agents never address each other; one writes a mark into the shared environment and another reads it later, the coordination mechanism behind swarms (Chapter 31) and the blackboard (Section 27.4).

1. Messages as Speech Acts Beginner

The naive view of a message is that it is data: agent A copies a value into a buffer and agent B reads it. That view is enough for moving numbers, but it is too thin for agents that reason, because it loses the intent behind the bytes. The same content, the string "the server is at capacity," means something different when it is an answer to a question, an unsolicited warning, or a command to stop sending traffic. Speech-act theory, imported into multi-agent systems from the philosophy of language, says that the primary unit of communication is not the proposition but the act the speaker performs by uttering it. Asserting, asking, promising, and ordering are different acts even when the propositional content is identical.

Agent communication languages turn this idea into a wire format. A message has three separable parts: a performative (the speech act, also called the message type), the content (the proposition the act is about), and a protocol context (the conversation the message belongs to, plus sender, receiver, and a reply deadline). The two historically important languages, KQML (the Knowledge Query and Manipulation Language) and the later FIPA-ACL (the Foundation for Intelligent Physical Agents Agent Communication Language), differ in details but agree on this skeleton. FIPA-ACL fixes a standard set of performatives, including inform (assert a fact), request (ask the receiver to act), propose (offer to act under stated conditions), accept-proposal, reject-proposal, and refuse. Because the performative is explicit, the receiver knows what kind of act it is responding to without having to infer it from the content.

Key Insight: The Performative Carries the Intent, So the Content Does Not Have To

Separating the speech act from the proposition is what lets agents reason about a message without parsing its meaning. A scheduler that receives refuse knows the task was declined and can rebid it, even if it never understood the task's content; a logger that sees inform knows a fact was asserted and can store it without acting on it. The performative is a small, closed vocabulary that every agent shares by construction, which is why it is the one part of a message that never suffers the semantic-interoperability problem that plagues the content.

2. Interaction Protocols: Conversations With Rules Beginner

A single message is rarely the whole story. Real coordination is a conversation: a request expects a reply, a call for bids expects proposals, an offer expects an acceptance or a rejection. An interaction protocol is a specification of the legal sequences of messages in such a conversation, a small state machine that every participant follows. The protocol constrains the message flow so that agents stay synchronized: after A sends request, the protocol says B may respond only with agree or refuse, and an inform arriving out of turn is a protocol error rather than a fact to be believed.

The simplest protocol is request-reply, two messages and done. More interesting is the contract-net protocol, introduced for distributed task allocation in Section 27.5: a manager broadcasts a call for proposals (cfp), contractors reply with propose or refuse, the manager awards the task with accept-proposal to the best bidder, and the winner returns an inform when done. Auctions, which we develop for negotiation in Section 29.6, are protocols of the same family: a structured exchange that turns a swarm of self-interested messages into a single allocation decision. The protocol is what makes the exchange a market rather than a shouting match.

The demo below implements contract-net from scratch, with explicit performatives, to show a protocol turning a stream of typed messages into a task allocation. A manager agent announces three tasks; three worker agents bid only on tasks they have the skill for and refuse the rest; the manager awards each task to the lowest bidder.

from dataclasses import dataclass

# A message is a performative (speech act) plus content and a conversation id.
@dataclass
class Message:
    performative: str          # cfp, propose, refuse, accept-proposal, inform
    sender: str
    receiver: str
    content: object
    conversation: str

class Bus:
    """Records every message so we can show the structured conversation."""
    def __init__(self):
        self.log = []
    def send(self, msg, handler):
        self.log.append(msg)
        return handler(msg)

class Worker:
    """A contractor agent. Bids on tasks it can serve, refuses ones it cannot."""
    def __init__(self, name, skills, load):
        self.name, self.skills, self.load = name, skills, load
    def handle_cfp(self, msg):
        task = msg.content
        if task["type"] not in self.skills:
            return Message("refuse", self.name, msg.sender, None, msg.conversation)
        cost = self.load + task["size"]                # lower bid is better
        return Message("propose", self.name, msg.sender, cost, msg.conversation)
    def handle_award(self, msg):
        self.load += msg.content["size"]
        return Message("inform", self.name, msg.sender, "done", msg.conversation)

class Manager:
    """Initiator: announce a task, collect bids, award to the cheapest bidder."""
    def __init__(self, name, bus, workers):
        self.name, self.bus, self.workers = name, bus, workers
    def allocate(self, task, conv):
        bids = []
        for w in self.workers:                         # broadcast the cfp
            reply = self.bus.send(Message("cfp", self.name, w.name, task, conv),
                                  w.handle_cfp)
            if reply.performative == "propose":
                bids.append((reply.content, w))
        cost, winner = min(bids, key=lambda b: b[0])   # pick the best proposal
        done = self.bus.send(
            Message("accept-proposal", self.name, winner.name, task, conv),
            winner.handle_award)
        return winner.name, cost, done.content

bus = Bus()
workers = [Worker("alice", {"ocr", "embed"}, load=2),
           Worker("bob",   {"ocr"},          load=0),
           Worker("carol", {"embed", "rank"}, load=1)]
mgr = Manager("manager", bus, workers)

for i, task in enumerate([{"type": "ocr",   "size": 3},
                          {"type": "embed", "size": 1},
                          {"type": "rank",  "size": 2}]):
    name, cost, report = mgr.allocate(task, f"c{i}")
    print(f"task {task['type']:>5} (size {task['size']}) -> awarded to {name:>5} "
          f"at bid {cost}, contractor reports: {report}")

print("\nfinal worker backlog:", {w.name: w.load for w in workers})
print("\nconversation c0 (the ocr task), one line per message:")
for m in bus.log:
    if m.conversation == "c0":
        c = m.content if m.content is not None else ""
        print(f"  {m.sender:>7} -> {m.receiver:<7} {m.performative:<16} {c}")

Code 29.4.1: A contract-net protocol built from performative messages. Each Message carries an explicit speech act; the Manager drives the legal sequence (cfp, then propose/refuse, then accept-proposal, then inform), and the Bus log lets us replay one conversation message by message.

task   ocr (size 3) -> awarded to   bob at bid 3, contractor reports: done
task embed (size 1) -> awarded to carol at bid 2, contractor reports: done
task  rank (size 2) -> awarded to carol at bid 4, contractor reports: done

final worker backlog: {'alice': 2, 'bob': 3, 'carol': 4}

conversation c0 (the ocr task), one line per message:
  manager -> alice   cfp              {'type': 'ocr', 'size': 3}
  manager -> bob     cfp              {'type': 'ocr', 'size': 3}
  manager -> carol   cfp              {'type': 'ocr', 'size': 3}
  manager -> bob     accept-proposal  {'type': 'ocr', 'size': 3}

Output 29.4.1: The protocol allocates each task to the lowest available bidder. The ocr task goes to bob (idle, bid 3) rather than alice (loaded, bid 5); carol wins both embed and rank because she is the only bidder for rank and the cheapest for embed. The replayed c0 conversation shows the manager broadcasting cfp to all three, then sending accept-proposal only to the winner.

Notice that the worker that lacks a skill returns refuse, and that message never appears in the c0 award trace because carol's reply to the ocr cfp was a refusal that did not enter the bid set. The protocol, not any single agent, is what guarantees that exactly one contractor is awarded each task. Change the protocol (let two managers run concurrently, say) and you must reason about message interleavings, which is exactly the coordination problem of Section 29.5.

3. Ontologies: Agreeing on What Words Mean Intermediate

A performative tells the receiver what kind of act a message is, but it says nothing about whether the receiver understands the content. If the manager's cfp describes a task as {"type": "ocr"} and a worker has only ever heard that capability called text-extraction, the message transfers no knowledge: the symbols do not line up. This is the semantic-interoperability problem, and it is the central difficulty of classical agent communication. Two agents transfer knowledge only if they share an ontology: an explicit, agreed vocabulary that fixes the concepts, their attributes, and the relations between them, so that a symbol on the wire denotes the same thing in both agents' heads.

Ontologies are why the FIPA program was as much about standardizing content languages and shared vocabularies as about the message envelope. In a closed system you can sidestep the problem by design: all agents are built by one team against one schema, and the ontology is implicit in the shared code (as it is in Code 29.4.1, where every agent agrees that a task is a dict with type and size). In an open system, where agents from different vendors must interoperate, the ontology must be made explicit and published, and mismatches must be reconciled by mapping one vocabulary onto another. This reconciliation, called ontology alignment, is hard precisely because meaning is not in the symbols; it is in the agreement about the symbols.

Fun Note: The Two Agents That Agreed on Everything Except Units

A classic failure mode of shared-ontology systems is the agents that share every symbol and still disagree, because the ontology fixed the name of a quantity but not its unit. One agent's distance: 5 meant kilometers; the other read meters and confidently planned a route a thousand times too short. The symbols matched perfectly. The meanings did not. An ontology that omits units is an agreement to misunderstand each other precisely.

4. Direct Versus Indirect Communication Intermediate

Everything so far has been direct communication: an agent addresses a message to a named recipient. There is a second, very different mode, sketched on the right of Figure 29.4.1. In indirect communication, agents never address each other at all; they communicate by modifying the shared environment and observing the modifications others make. This is stigmergy, a term borrowed from the study of social insects: an ant deposits pheromone on a trail, and later ants are influenced not by the first ant but by the mark it left. No message was sent to anyone, yet information flowed.

Stigmergy is the communication mechanism behind swarm coordination, which Chapter 31 develops in full: simple agents, each following local rules, produce coherent global behavior because the environment carries the shared state between them. The blackboard architecture of Section 27.4 is the same idea in a knowledge-based setting: specialists never call each other directly; each reads the current blackboard, contributes what it can, and writes the result back for others to find. Indirect communication trades the precision of addressed messages for two real advantages. It is anonymous, so agents can join and leave without anyone updating a contact list, and it decouples sender from receiver in time, since a mark written now can be read much later. Its cost is that the environment becomes shared mutable state, with all the coordination hazards that implies, which is why stigmergic systems lean on the conflict-resolution machinery of Section 29.5.

5. The Cost of Communication Intermediate

A message is never free. Every byte an agent sends crosses a network with finite bandwidth and nonzero latency, and in a system of $n$ agents the temptation to let everyone tell everyone else everything leads to $O(n^2)$ message traffic that no real network absorbs. This is the same scale-out tax that governs collective communication in distributed training, modeled with the latency-bandwidth (alpha-beta) cost in Section 4.1: the time to move a message of $b$ bytes is approximately

$$T_{\text{msg}}(b) = \alpha + \beta\, b,$$

where $\alpha$ is the fixed per-message latency and $\beta$ is the inverse bandwidth. The fixed cost $\alpha$ is why many small messages are far worse than one batched message of the same total size, and the term $\beta\,b$ is why verbose messages hurt. The practical consequence is that well-designed agents communicate selectively: they send a message only when its expected value to the recipient exceeds its cost, they batch where they can, and they prefer broadcasting a single mark to the environment over addressing $n$ separate recipients when the audience is large. An agent architecture that ignores communication cost will be correct in a simulator and unusable on a real cluster, the same lesson the rest of this book teaches about gradients and activations.

Practical Example: The Trading Desk That Drowned in Its Own Chatter

Who: A platform engineer running a fleet of automated market-making agents at a brokerage.

Situation: Each of forty agents broadcast its full order book to every other agent on every price tick to keep a shared view, a direct all-to-all communication pattern.

Problem: At forty agents the broadcast traffic scaled as the square of the fleet size; adding agents made the shared view staler, not fresher, because messages queued behind a saturated link.

Dilemma: Keep direct all-to-all messaging, which is simple and precise but $O(n^2)$, or switch to indirect communication through a shared in-memory book that every agent reads and writes, anonymous and $O(n)$ but introducing contention on shared state.

Decision: They moved to a shared book (a stigmergic, blackboard-style design) and had agents publish only deltas, not full snapshots, applying the selectivity principle directly.

How: Each agent wrote its own updates to a partitioned shared structure and read others' updates from it; per-message size fell because only changes were sent, and the message count fell from $O(n^2)$ to $O(n)$.

Result: Network traffic dropped by more than an order of magnitude and the shared view became fresher under load, because the $\alpha$ term in $T_{\text{msg}}$ was paid far fewer times.

Lesson: When the audience is the whole fleet, indirect communication through the environment often beats addressed messages, and sending deltas instead of snapshots attacks the $\beta\,b$ term directly.

6. The Natural-Language Turn Advanced

The classical picture, performatives plus a shared ontology plus a protocol, was engineered to make meaning unambiguous and machine-checkable. Language-model agents change the picture in one move: they communicate in natural language. Two LLM agents can exchange free-form English, and each interprets the other's message with the same broad linguistic competence it uses for everything else. This largely dissolves the semantic-interoperability problem that ontologies were built to solve, because the agents do not need a pre-agreed symbol for text-extraction; one can write "please pull the text out of this scan" and the other simply understands, mapping the request onto its own capabilities on the fly.

The dissolution is not free, and it is important to name the new costs plainly rather than treat natural language as a strict upgrade. Natural language is ambiguous in exactly the way ontologies forbid: "now," "the file," and "as soon as possible" mean what the context implies, and two agents can confidently infer different intents from the same sentence, the failure the epigraph dramatizes. It is also verbose: an English message carrying the information of a five-byte performative might be fifty tokens, so the $\beta\,b$ communication cost of Section 5 rises sharply, and in LLM systems those tokens are also a direct monetary and latency cost at inference time. The modern design pattern is therefore a hybrid: natural language for the open-ended content where flexibility pays, wrapped in a structured envelope of typed fields and explicit protocols where machine-checkable coordination is needed. That hybrid is precisely what the emerging agent-interoperability protocols formalize, which we take up in Section 32.6.

Library Shortcut: A Structured Agent Message Without Hand-Rolling the Envelope

In Code 29.4.1 we defined the Message dataclass, the performative vocabulary, and the dispatch by hand. Modern agent frameworks ship the envelope so you declare only the content. In Microsoft AutoGen, sending a typed message between agents is a single structured call; the framework supplies the conversation id, sender, receiver, and the asynchronous transport that our Bus stood in for:

# pip install autogen-core
from dataclasses import dataclass
from autogen_core import MessageContext, RoutedAgent, message_handler

@dataclass
class CallForProposal:          # the content; the framework adds the envelope
    task_type: str
    size: int

class Contractor(RoutedAgent):
    @message_handler                                   # routes by message TYPE
    async def on_cfp(self, msg: CallForProposal, ctx: MessageContext):
        bid = self.load + msg.size                     # ctx carries sender + conversation
        await self.publish_message(Proposal(bid), ctx.topic_id)

Code 29.4.2: The same performative dispatch as Code 29.4.1, now declared by message type. The roughly thirty lines of Message, Bus, and manual routing collapse to a typed handler; the framework owns the conversation context, the message bus, and the asynchronous delivery, leaving you to write only the content classes and the bidding logic.

7. Where Communication Leads Beginner

We now have the substrate. Agents exchange typed performative messages whose intent is explicit; protocols constrain those messages into coherent conversations; ontologies (or, for language-model agents, shared natural language) make the content mean the same thing on both ends; stigmergy offers an indirect alternative when the audience is the whole fleet; and the cost of every message is the same scale-out tax this book has counted since Section 4.1. What we have not yet built is the logic that decides what to say to whom in order to act jointly. That is coordination, and the conversations of this section are its raw material.

Thesis Thread: Communication Is the Scale-Out Tax, Paid in Messages

This section is the multi-agent face of the book's central trade. In data-parallel training the tax is the all-reduce that synchronizes gradients (Section 4.1); among agents it is the message that synchronizes beliefs and intentions. The cost model $T_{\text{msg}}(b)=\alpha+\beta b$ is the same, the pressure toward selectivity and batching is the same, and the choice between addressed messages and shared-state (stigmergic) communication mirrors the choice between point-to-point and collective patterns in Section 27.4. Distributing intelligence, like distributing computation, is governed by what it costs to move information between the pieces.

Research Frontier: Emergent Communication and Agent Protocols (2024 to 2026)

Two active lines are reshaping agent communication. The first is emergent communication: rather than hand-designing an ontology, agents learn a communication protocol from scratch through multi-agent reinforcement learning, developing their own symbols to solve a shared task, with recent work studying when such learned languages become compositional and human-interpretable and connecting them to the MARL methods of Chapter 30. The second is the rapid standardization of natural-language agent protocols: Anthropic's Model Context Protocol (MCP, 2024) standardizes how an agent connects to tools and data, and Google's Agent2Agent (A2A, 2025) standardizes how autonomous agents discover and message one another, the modern, language-model-native descendants of FIPA-ACL. A parallel empirical thread documents the failure modes of natural-language coordination, including ambiguity-induced misalignment and runaway verbosity in multi-agent LLM systems, motivating the structured-envelope hybrids of Section 6. We develop these protocols with the orchestration machinery to use them in Section 32.6.

Exercise 29.4.1: Performative or Content? Conceptual

For each exchange, identify the performative being used and explain why labeling it explicitly (rather than leaving the receiver to infer intent from the content) changes how the receiver must respond: (a) "Task 7 is complete" sent in reply to a previous award; (b) "Can you encode these 10,000 images by 4pm?"; (c) "I will encode them for a cost of 12 units, if you also grant me priority on the next batch"; (d) "No." Then argue why the four-message contract-net sequence in Code 29.4.1 would break if the refuse performative were dropped and a refusal were sent as a bid of infinite cost instead.

Exercise 29.4.2: Extend the Protocol Coding

Modify Code 29.4.1 so the manager runs a true sealed-bid round with a deadline: collect every propose first, then, if no bidder can serve a task (all refuse), have the manager send an inform to itself recording the task as unallocated rather than crashing on an empty bid set. Add a fourth worker that shares a skill with an existing one and tie-breaks on a lower load, and verify from the Bus log that the protocol still awards each task exactly once. Report which tasks, if any, go unallocated and why.

Exercise 29.4.3: Count the Tax Analysis

Consider $n$ agents that must reach a shared view of a single value. In the direct design, every agent sends its value to every other agent. In the stigmergic design, every agent writes its value once to a shared store and reads the store once. Using the cost model $T_{\text{msg}}(b)=\alpha+\beta b$ from Section 5, write the total communication time of each design as a function of $n$ (count messages, and treat each store access as one message). State the crossover $n$ above which the stigmergic design wins when $\alpha$ dominates, and relate your answer to the trading-desk example and to the collective-versus-point-to-point trade-off in Section 4.1.