Part VI: Distributed AI and Multi-Agent Systems
Chapter 28: Game-Theoretic Foundations for Multi-Agent AI

Normal-Form and Extensive-Form Games

"Tell me how you wrote the game down, and I will tell you whether I can solve it before the heat death of the cluster."

A Search Tree That Has Seen the Branching Factor
Big Picture

Before an agent can reason about other agents, the interaction has to be written down, and there are exactly two canonical ways to write it: as a matrix of payoffs over simultaneous choices (normal form) or as a tree of sequential moves with the information each player holds when they act (extensive form). The choice is not cosmetic. It fixes what every agent in a multi-agent system must reason over, how much memory the description costs, and, above all, how the work of solving the game grows as players, actions, and turns multiply. A tree that captures the order of play is faithful to a turn-taking world, but its equivalent matrix is usually exponentially larger, and that blow-up is the reason game-playing AI leans on search and sampling rather than enumerating the matrix. This section builds both representations, computes payoffs in each, and solves a perfect-information tree by backward induction, so the rest of the chapter can talk about equilibria over objects you have already constructed by hand.

Section 28.1 argued that agents sharing an environment need game theory because each agent's best move depends on what the others do. That argument was about motivation. To do anything with it, we need a precise object: a description of who the players are, what they can do, what they know, and what they get. Game theory offers two such objects, and almost every model in the rest of this part is one of them or a stochastic generalization of one. The first, normal form, is built for choices made at the same instant with no signal of what the others are doing. The second, extensive form, is built for choices made in sequence, where later movers may or may not see what earlier movers did. We take them in that order, then connect them.

1. Normal Form: The Payoff Matrix Beginner

A game in normal form (also called strategic form) is the most compact description of a one-shot, simultaneous interaction. It has three ingredients and nothing else. There is a finite set of players $\{1, 2, \dots, n\}$. Each player $i$ has a finite set of actions $A_i$. And there is a payoff function that assigns, to every joint action profile $a = (a_1, a_2, \dots, a_n)$ with $a_i \in A_i$, a utility for each player:

$$u_i : A_1 \times A_2 \times \cdots \times A_n \to \mathbb{R}, \qquad u(a) = \big(u_1(a), u_2(a), \dots, u_n(a)\big).$$

The whole game is the tuple $(\,N,\, \{A_i\},\, \{u_i\}\,)$. For two players the payoff function is a matrix: rows are the first player's actions, columns are the second player's, and each cell holds the pair $(u_1, u_2)$. Nothing in this description mentions time or order, which is exactly the point. Normal form assumes the players move once, together, each blind to the others' choice, so the only thing that matters is the joint profile and the payoffs it yields. That assumption is a faithful model of a sealed-bid auction, a one-round routing decision, or two services independently picking a retry policy in the same instant.

Key Insight: The Representation Sets the Reasoning Burden

An agent does not reason about "the game" in the abstract; it reasons about whatever object you handed it. Normal form hands it a payoff function over joint profiles, so the agent must consider every combination of the other players' actions, a space whose size is the product of the action-set sizes, $\prod_i |A_i|$. Add a player or widen an action set and that space multiplies. This is the first place scale bites in multi-agent AI: the description an agent must search is exponential in the number of players, long before anyone computes an equilibrium.

2. Extensive Form: The Game Tree and Information Sets Intermediate

Most real interactions are not simultaneous. One agent acts, another observes and responds, a third reacts to both. The extensive form captures this with a tree. Each internal node is a decision point belonging to one player; each edge is an action; each leaf holds a payoff vector, one number per player. A play of the game is a path from the root to a leaf, and the order of nodes along that path is the order of moves. The tree makes the sequence explicit, which the matrix cannot.

The subtle ingredient is information. When a player is about to move, what do they know about how the game reached this point? The extensive form answers with information sets: a group of nodes among which the mover cannot tell which one they are actually at. If every information set is a single node, every player always knows the full history when they move, and the game has perfect information (chess, Go, tic-tac-toe). If some information set contains several nodes, the mover is uncertain about earlier hidden choices, and the game has imperfect information (poker, where you cannot see the opponent's cards; a sealed-bid auction modeled sequentially, where you do not observe rival bids). Information sets are how the extensive form encodes partial observability, the same property that Chapter 27 identified as the defining hardship of distributed intelligence.

Normal form: payoff matrix Row \ Col A B A B (2, 1) (0, 0) (0, 0) (1, 2) each cell: (row payoff, col payoff) for a simultaneous joint choice Extensive form: game tree P1 P2 P2 L R l r l r (3,1) (1,2) (0,0) (2,3) dashed oval = information set: if present, P2 cannot tell L from R (imperfect information); remove it for perfect
Figure 28.2.1: The same kind of two-player interaction in both representations. Left: a normal-form payoff matrix, where each cell gives (row payoff, column payoff) for a simultaneous joint choice. Right: an extensive-form tree, where Player 1 moves first (L or R), Player 2 responds (l or r), and leaves hold payoff pairs. The dashed red oval is an information set: drawn around both of Player 2's nodes it means Player 2 cannot see Player 1's move, giving imperfect information; treating each node as its own singleton information set (no oval) gives perfect information, which is the version the code in this section solves.

3. Strategies, Pure and Mixed Intermediate

An action is a single choice; a strategy is a complete plan. In normal form the distinction barely shows, because a player chooses once, so a pure strategy is just an action. In extensive form a pure strategy is heavier: it must specify an action at every information set the player could ever reach, even nodes that good play never visits. That is why a tree with a handful of decision nodes can spawn many strategies, a gap we make concrete in Section 5.

A pure strategy commits to one choice with certainty. A mixed strategy randomizes: player $i$ picks a probability distribution $\sigma_i$ over its actions and samples from it. Under a mixed profile $\sigma = (\sigma_1, \dots, \sigma_n)$ the payoff is an expectation over the independently drawn actions,

$$u_i(\sigma) = \sum_{a \in A_1 \times \cdots \times A_n} \Big(\prod_{j=1}^{n} \sigma_j(a_j)\Big)\, u_i(a).$$

Randomization is not a trick; it is sometimes the only stable way to play. In a pursuit game, any deterministic policy is a pattern an opponent can learn and exploit, so being unpredictable, mixing, is itself the optimal behavior. This is why a poker bot bluffs with a fixed frequency and a load balancer may randomize placement: a predictable agent in an adversarial setting hands its opponent a free model of itself. The deep result that some games have no stable pure-strategy outcome but always a stable mixed one is the subject of Section 28.3; here we only need to know that strategies can be randomized and that payoffs then become expectations.

Fun Note: The Only Winning Move Is to Flip a Coin

Rock-paper-scissors has no good deterministic strategy: any fixed choice loses to one reply with certainty. The unique stable play is the uniform mixed strategy, each option a third of the time. A bot that "reads" your bias wins not because it is smart but because you were not random enough, which is the entire premise of the bots that routinely beat humans at the game by detecting the patterns no human can suppress.

4. Zero-Sum, General-Sum, and the Motive of the Game Beginner

Independent of how a game is drawn, the structure of its payoffs decides what kind of reasoning it demands. A game is zero-sum (more precisely, constant-sum) when the players' payoffs always add to the same constant, so one agent's gain is exactly another's loss: $\sum_i u_i(a) = c$ for every profile $a$. Chess and two-player poker are zero-sum; there is nothing to share and the interaction is purely adversarial. A general-sum game lifts that restriction, and most real systems are general-sum, because the payoffs can rise or fall together.

General-sum games split by motive. A purely cooperative game (a team game) gives all players the identical payoff, so they win or lose together and the only difficulty is coordination, exactly the regime of cooperative multi-agent learning in Chapter 30. A purely competitive game is the zero-sum case. The interesting and common middle is mixed-motive: players share some interest in a good joint outcome but conflict over which good outcome, like two autonomous vehicles that both want to avoid a collision (aligned) yet each prefer the other yield (opposed). The payoff matrix in Figure 28.2.1 is mixed-motive: both players prefer to coordinate on the same letter over miscoordinating, but the row player prefers (A, A) while the column player prefers (B, B). Naming the motive tells an AI designer whether to build for coordination, for adversarial robustness, or for the harder negotiation that mixed motives force, the subject of Chapter 29.

5. Both Forms in Code, and Why Trees Explode Intermediate

The code below builds both representations from scratch and operates on them. It encodes the normal-form matrix of Figure 28.2.1 as a dictionary from joint profiles to payoff pairs, computes the expected payoff of a mixed profile, then builds the perfect-information version of the tree and solves it by backward induction: starting at the leaves, each mover picks the child that maximizes their own utility, and the choice propagates up to the root. Finally it expands the same tree into its equivalent normal form to show the size gap.

row_actions = ["A", "B"]
col_actions = ["A", "B"]
payoff = {("A","A"): (2,1), ("A","B"): (0,0),
          ("B","A"): (0,0), ("B","B"): (1,2)}   # (u_row, u_col) per joint profile

# --- Normal form: the payoff for every joint action profile ---
print("=== Normal form: payoff for every joint action profile ===")
print("   profile | row util | col util")
for ar in row_actions:
    for ac in col_actions:
        ur, uc = payoff[(ar, ac)]
        print(f"       {ar},{ac} | {ur:8d} | {uc:8d}")

# Expected payoff of a MIXED profile: row plays A w.p. p, col plays A w.p. q.
def expected_payoffs(p, q):
    er = ec = 0.0
    for ar, pr in zip(row_actions, (p, 1 - p)):
        for ac, qc in zip(col_actions, (q, 1 - q)):
            ur, uc = payoff[(ar, ac)]
            er += pr * qc * ur          # weight each profile by its joint probability
            ec += pr * qc * uc
    return er, ec

er, ec = expected_payoffs(0.5, 0.5)
print(f"\nMixed profile p(row=A)=0.5, q(col=A)=0.5: E[row]={er}, E[col]={ec}")

# Perfect-information game tree: P1 moves (L/R), then P2 moves (l/r), leaves hold payoffs.
game_tree = {"player": 1, "children": {
    "L": {"player": 2, "children": {"l": {"leaf": (3,1)}, "r": {"leaf": (1,2)}}},
    "R": {"player": 2, "children": {"l": {"leaf": (0,0)}, "r": {"leaf": (2,3)}}}}}

# Backward induction: each mover maximizes ITS OWN coordinate, leaves -> root.
def backward_induction(node):
    if "leaf" in node:
        return node["leaf"], []
    mover = node["player"]
    best_payoff, best_path = None, None
    for action, child in node["children"].items():
        payoff_vec, sub = backward_induction(child)
        if best_payoff is None or payoff_vec[mover - 1] > best_payoff[mover - 1]:
            best_payoff, best_path = payoff_vec, [(mover, action)] + sub
    return best_payoff, best_path

eq_payoff, eq_path = backward_induction(game_tree)
print("\n=== Extensive form: backward induction (perfect information) ===")
for mover, action in eq_path:
    print(f"Player {mover} chooses '{action}'")
print(f"Subgame-perfect outcome payoff: P1={eq_payoff[0]}, P2={eq_payoff[1]}")

# Equivalent normal form: P2's pure strategy is a (reply-to-L, reply-to-R) pair.
p1_strategies = ["L", "R"]
p2_strategies = [(x, y) for x in ("l", "r") for y in ("l", "r")]
print("\n=== Equivalent normal form of the tree ===")
print("Player 1 pure strategies :", p1_strategies)
print(f"Player 2 pure strategies : {p2_strategies}  ({len(p2_strategies)} of them)")
for p1 in p1_strategies:
    for p2 in p2_strategies:
        reply = p2[0] if p1 == "L" else p2[1]   # P2's action on the branch P1 chose
        leaf = game_tree["children"][p1]["children"][reply]["leaf"]
        print(f"  P1={p1}, P2={p2} -> payoff {leaf}")
Code 28.2.1: One small interaction in both representations: a normal-form payoff dictionary with a mixed-strategy expectation, and a perfect-information tree solved by backward induction, then expanded to its (larger) equivalent normal form. The mover at each tree node maximizes coordinate mover - 1 of the payoff vector, which is what makes induction self-interested rather than cooperative.
=== Normal form: payoff for every joint action profile ===
   profile | row util | col util
       A,A |        2 |        1
       A,B |        0 |        0
       B,A |        0 |        0
       B,B |        1 |        2

Mixed profile p(row=A)=0.5, q(col=A)=0.5: E[row]=0.75, E[col]=0.75

=== Extensive form: backward induction (perfect information) ===
Player 1 chooses 'R'
Player 2 chooses 'r'
Subgame-perfect outcome payoff: P1=2, P2=3

=== Equivalent normal form of the tree ===
Player 1 pure strategies : ['L', 'R']
Player 2 pure strategies : [('l', 'l'), ('l', 'r'), ('r', 'l'), ('r', 'r')]  (4 of them)
  P1=L, P2=('l', 'l') -> payoff (3, 1)
  P1=L, P2=('l', 'r') -> payoff (3, 1)
  P1=L, P2=('r', 'l') -> payoff (1, 2)
  P1=L, P2=('r', 'r') -> payoff (1, 2)
  P1=R, P2=('l', 'l') -> payoff (0, 0)
  P1=R, P2=('l', 'r') -> payoff (2, 3)
  P1=R, P2=('r', 'l') -> payoff (0, 0)
  P1=R, P2=('r', 'r') -> payoff (2, 3)
Output 28.2.1: Backward induction predicts Player 1 plays R because it foresees Player 2 will then choose r for payoff 3, yielding Player 1 a 2 (better than the 1 it would get on the L branch). The crucial line is the last block: a tree with only two of Player 2's decision nodes already produces four distinct pure strategies for Player 2, so the equivalent matrix is larger than the tree that generated it.

That last block is the lesson the section has been building toward. Player 2 faces two decision nodes, and a pure strategy must name a reply at each, so the number of pure strategies is $2 \times 2 = 4$ from a tree with two relevant nodes. In general, if a player acts at $m$ information sets, each offering $b$ actions, that player has $b^{m}$ pure strategies, and the normal form lists every combination across all players. The matrix is therefore exponentially larger than the tree, which is why no serious game-playing system enumerates it.

Thesis Thread: Representation Decides Whether You Search or Enumerate

The exponential gap between a game tree and its matrix is the same pressure that pushed AI off a single machine in Section 1.1: a description outgrows the resource that must hold it, and the only way forward is to stop materializing the whole thing. You cannot enumerate the normal form of chess (its strategy count dwarfs the atoms in the universe), so game-playing AI searches the tree directly, prunes it, and samples it with Monte Carlo rollouts. The choice of extensive over normal form is precisely what makes that search-and-sample approach available, and it is why the sequential, tree-shaped view connects directly to the Markov games of Chapter 30, where the tree becomes a stochastic state graph an agent learns over rather than enumerates.

Library Shortcut: OpenSpiel Carries Both Forms for You

Code 28.2.1 hand-rolled a tree, a backward-induction solver, and a matrix expansion. DeepMind's OpenSpiel ships hundreds of games already encoded in extensive form, exposes the information-set structure, and provides solvers (best response, counterfactual regret minimization, backward induction) out of the box. The same two-player tree, with traversal and payoffs handled for you, is a few lines:

# pip install open_spiel
import pyspiel

game = pyspiel.load_game("kuhn_poker")     # a classic imperfect-information tree
state = game.new_initial_state()
print(game.num_players(), "players, information sets tracked automatically")
while not state.is_terminal():
    a = state.legal_actions()[0]           # walk one path down the extensive form
    state.apply_action(a)
print("returns:", state.returns())         # payoff vector at the reached leaf
Code 28.2.2: OpenSpiel replaces the roughly forty lines of Code 28.2.1 with a loaded game object that already knows its tree, information sets, legal actions, and payoffs. You write the agent logic; the library owns the representation, including the imperfect-information bookkeeping that backward induction alone cannot handle.
Practical Example: Choosing a Representation for a Cloud Spot-Bidding Agent

Who: An ML platform engineer building an agent that bids for preemptible (spot) GPU instances against other tenants' agents.

Situation: Each scheduling round, several agents submit bids for a shared pool of cheap GPUs; the highest bidders win, and the rest fall back to expensive on-demand capacity.

Problem: The team first modeled the contest as a sequential extensive-form game, with agents bidding in observed turns, and the solver was unusable: the tree branched on every possible bid of every rival at every round.

Dilemma: Keep the faithful sequential tree, which captures who-saw-what but explodes combinatorially, or collapse to a normal-form sealed-bid matrix, which loses the turn order but is small and solvable.

Decision: They modeled one scheduling round as a simultaneous normal-form game, because bids are in fact submitted without seeing rivals' bids, so the sequential tree was encoding an order that did not exist.

How: Each agent's action set became a discrete grid of bid levels; the payoff function combined the probability of winning at each bid against the cost saved versus on-demand, exactly the $u_i$ over joint profiles of Section 1.

Result: The normal-form model solved in milliseconds per round and produced a randomized (mixed) bidding policy that was robust to rivals learning the pattern, where the tree-based attempt had never finished.

Lesson: Match the representation to the real information structure. A simultaneous sealed-bid interaction is normal form; forcing it into an extensive-form tree manufactures branching that buys no fidelity and destroys tractability.

6. Why This Matters for Distributed AI Intermediate

Every later model in this part is one of these two forms or a generalization. When a single state and a one-shot interaction become a sequence of states with stochastic transitions, the extensive form generalizes into a Markov game (a stochastic game), which is the formal substrate of multi-agent reinforcement learning in Chapter 30. There, agents do not receive the tree and enumerate it; they learn a policy over states by interacting, precisely because the tree is too large to write down, and the actor-learner infrastructure of Chapter 20 is what lets that learning run across many machines. The negotiation, auction, and task-allocation protocols of Chapter 29 are, underneath, normal-form and extensive-form games whose payoff functions encode the agents' competing objectives. Fixing the representation now is what lets those chapters reason about coordination and conflict without re-deriving the scaffolding each time.

Research Frontier: Scaling Solvers on Imperfect-Information Trees (2024 to 2026)

The hardest games for AI are large extensive-form games with imperfect information, where backward induction does not apply because a mover cannot tell which node of its information set it occupies. The dominant approach, counterfactual regret minimization (CFR), drove the superhuman poker agents Libratus and Pluribus, and the live research effort is making it scale: ReBeL (Brown et al.) unified search with self-play reinforcement learning over belief states, and 2024 to 2026 work pushes depth-limited and continuous-action solving on ever-larger trees, alongside no-regret learning analyses for the multi-agent settings that Chapter 30 trains. DeepMind's OpenSpiel remains the common benchmark substrate, and the open problem is the one this section flagged: the tree is astronomically large, so every advance is really a smarter way to avoid enumerating the equivalent normal form.

We now have the two objects every game-theoretic model rests on, the payoff matrix and the game tree, plus the strategy vocabulary (pure and mixed) and the motive vocabulary (zero-sum, cooperative, competitive, mixed-motive) to classify any interaction we meet. What we do not yet have is a notion of a good joint outcome: given the matrix, which profile should rational agents actually settle on? That question, and the equilibrium concepts that answer it, is the subject of Section 28.3.

Exercise 28.2.1: Classify the Motive Conceptual

For each interaction, state whether it is best modeled as normal form or extensive form, whether it is zero-sum, cooperative, competitive, or mixed-motive, and whether it has perfect or imperfect information: (a) two data-center cooling agents that both want low total energy use but each controls a different zone; (b) two-player heads-up poker; (c) chess; (d) two ad-bidding agents in a sealed-bid auction for the same ad slot. Justify the information-set classification in each case.

Exercise 28.2.2: Add an Information Set Coding

Modify Code 28.2.1 so Player 2's two decision nodes form a single information set (imperfect information: Player 2 cannot see whether Player 1 played L or R). Explain why backward_induction as written is no longer valid in this case, then implement a brute-force solver that, for each of Player 2's four pure strategies, computes the resulting payoff and reports which strategy maximizes Player 2's utility given each Player 1 choice. Compare the outcome to the perfect-information result in Output 28.2.1.

Exercise 28.2.3: Count the Explosion Analysis

A player acts at $m$ information sets, each offering $b$ actions. Show that the player has $b^{m}$ pure strategies, and that an $n$-player game where every player has this structure has a normal form with $b^{mn}$ cells. Plug in $b = 3$, $m = 10$, $n = 2$ and compare the cell count to the number of nodes in the underlying tree (at most $b^{2m}$ leaves for two alternating players). Argue from these numbers why a system must search or sample the extensive form rather than build its matrix, connecting back to the resource-ceiling argument of Section 1.1.