"Nobody on the team decided to drive off the cliff. We took a vote, and the vote was unanimous, and the vote was wrong, and that is exactly the problem."
A Particle Stuck in a Local Optimum Everyone Agreed On
The same feedback, stigmergy, and imitation that let a swarm compute answers no single agent could reach can, with the sign of one loop flipped, drive the whole collective into a confident, coordinated mistake that no individual intended and no individual can fix. Every competence in this chapter has a failure mode that is its mirror image: positive feedback that amplifies a good trail also amplifies a wrong one (a herding cascade), convergence that finds the optimum also collapses diversity onto a suboptimal one (stagnation), the local coupling that produces graceful flocking also produces gridlock and oscillation, and the redundancy that shrugs off random crashes can be steered by a handful of correlated or malicious agents. This closing section is the inverse of Section 31.8: there we asked how local rules produce useful global order; here we ask how the same rules produce dangerous global order, and what monitoring, diversity, damping, and circuit-breaking it takes to keep emergent behavior safe rather than merely assumed benign.
Through this chapter, emergence has been the hero. Ant colony optimization (Section 31.3) and particle swarm optimization (Section 31.4) turned simple local rules and reinforcement into global search; flocking (Section 31.5) and collective perception (Section 31.6) turned local sensing into coordinated motion and shared estimates; emergent communication (Section 31.7) and decentralized coordination (Section 31.8) let agents organize with no manager at all. The uncomfortable truth this section confronts is that emergence does not come with a sign attached. The feedback loop that concentrates pheromone on the shortest path is structurally identical to the loop that concentrates a market's whole order flow on a mispriced asset; the imitation that lets a flock turn as one is the imitation that lets a rumor become consensus. A collective system can fail in ways that are invisible at the level of any single agent, because the failure lives in the coupling between agents, not inside any one of them. That is what makes these failures hard to predict, hard to debug, and dangerous to assume away.
1. Herding and Information Cascades: Positive Feedback Turned Against You Intermediate
The cleanest collective failure is the information cascade, and it is dangerous precisely because each agent in it behaves rationally. Picture agents deciding in sequence whether to adopt an option, each holding one private signal that is right more often than not, and each able to see the public actions of everyone before it but not their private signals. The classic model of Bikhchandani, Hirshleifer, and Welch makes the trap exact. A rational agent treats the public tally of prior actions as evidence: if enough predecessors have adopted, that public count outweighs a single contrary private signal, so the agent adopts regardless of what its own signal says. The moment the tally reaches that threshold, every subsequent agent is in the same position and copies too, and from then on no private signal ever enters the public record again. The collective has converged, confidently, on whatever the first one or two signals happened to be, which with non-trivial probability is wrong.
The mathematics is a Bayesian threshold. Let the private signal be correct with probability $p > \tfrac{1}{2}$, and let the public record show a net tally $t$ (adoptions minus rejections) among prior agents who acted on information. Each informative action carries the same log-likelihood weight as one private signal, so an agent compares its single private signal against $t$ accumulated signals. Once $|t| \ge 1$ and the public lean opposes the agent's signal, the posterior favors copying:
$$\Pr[\text{adopt is correct} \mid t,\, s] \;=\; \frac{p^{\,n_+}\,(1-p)^{\,n_-}}{p^{\,n_+}(1-p)^{\,n_-} + (1-p)^{\,n_+}\,p^{\,n_-}}, \qquad t = n_+ - n_-,$$where $n_+$ and $n_-$ count the public signals pointing each way including the agent's own. When $t$ already exceeds one in magnitude, flipping the agent's lone signal cannot change the sign of the exponent, so the rational action ignores private evidence entirely. That is the cascade: information stops flowing the instant imitation becomes individually optimal. It is the exact inverse of the pheromone reinforcement in Section 31.3, where positive feedback concentrating probability on one path was the engine of competence. Here the same concentration is the engine of error, and the system is most confident exactly when it has stopped learning.
An independent crowd gets smarter as it grows: aggregating $n$ signals of accuracy $p > \tfrac12$ drives the majority's accuracy toward $1$ as $n \to \infty$ (the Condorcet jury theorem). A herding crowd does the opposite. Once the cascade locks, every later agent is a copy, so the collective carries no more information than the handful of signals that formed before the lock. Adding agents to a herding system does not add accuracy; it only adds confidence to whatever the first few agents decided. The pathology is not that agents are irrational, it is that individually rational imitation destroys the diversity that made aggregation work in the first place. Diversity is not a nicety in a collective system; it is the resource that herding consumes.
The runnable demo below makes the gap measurable. It simulates the cascade exactly as described, then re-runs the identical agents and signals under one change: each agent acts on its own private signal instead of copying the public tally, injecting independent information back into the record. The collective accuracy is measured over twenty thousand random worlds for each regime.
import random
# Bikhchandani-Hirshleifer-Welch information cascade.
# A hidden world state W in {GOOD, BAD}. Each agent gets ONE private binary
# signal that is correct with probability p > 0.5, then acts in sequence,
# seeing only the PUBLIC actions of predecessors (never their private signal).
# A rational agent follows the public tally when it outweighs one private signal;
# once two net same-way actions accumulate, every later agent rationally ignores
# its own signal and copies, a cascade that can lock onto the WRONG action.
P = 0.60 # private-signal accuracy
N = 40 # agents acting in sequence
TRIALS = 20000
def one_run(world, herding=True):
up = 0 # public tally = (# adopt) - (# reject) so far
actions = []
for _ in range(N):
signal = 1 if random.random() < P else 0 # 1 = "adopt is right"
if world == 0: # if world is BAD, the correct private read is reject
signal = 1 - signal
if herding:
# Bayesian agent: public tally acts like that many extra signals;
# if |tally| >= 1 it overrides a single private signal -> copy.
if up > 0: act = 1
elif up < 0: act = 0
else: act = signal
else:
act = signal # MITIGATION: always act on OWN private signal
actions.append(act)
up += 1 if act == 1 else -1
return 1 if sum(actions) * 2 >= N else 0 # collective = majority
def evaluate(herding):
correct = 0
for _ in range(TRIALS):
world = random.randint(0, 1) # GOOD=1, BAD=0
correct += (one_run(world, herding) == world)
return correct / TRIALS
random.seed(2024); herd_acc = evaluate(herding=True)
random.seed(2024); indep_acc = evaluate(herding=False)
print(f"private-signal accuracy p : {P:.2f}")
print(f"agents per run N : {N}")
print(f"herding (copy predecessors) : {herd_acc:.3f}")
print(f"diversity (own signal only) : {indep_acc:.3f}")
print(f"accuracy gained by diversity : {indep_acc - herd_acc:+.3f}")
act: copy the public tally (herding) or act on the private signal (diversity). Everything else, the worlds, the signals, the seed, is held identical.private-signal accuracy p : 0.60
agents per run N : 40
herding (copy predecessors) : 0.600
diversity (own signal only) : 0.903
accuracy gained by diversity : +0.303
The numbers tell the whole story of this section in miniature. Forty herding agents achieve $0.600$ accuracy, which is the accuracy of a single agent: the cascade has thrown away thirty-nine signals' worth of information. The same forty agents, acting independently, reach $0.903$, because now the majority aggregates all forty signals as the Condorcet jury theorem promises. The mitigation is not cleverer agents or more agents; it is preserved diversity. Every cure in the rest of this section is a variation on that one move, applied to a different failure.
2. Premature Convergence and the Death of Diversity Intermediate
Herding is the sequential, social version of a failure that the optimization swarms of this chapter suffer in their own right: premature convergence. A particle swarm (Section 31.4) is pulled toward the best position any particle has found; an ant colony (Section 31.3) lays pheromone on the paths it has already taken. Both are positive-feedback search, and positive feedback has one failure mode by construction. If the swarm finds a decent-but-suboptimal solution early, every agent is drawn toward it, the spread of the population collapses, and with no agent exploring anything different there is no longer any source of new information to escape the basin. The swarm has converged, but onto the wrong optimum, and it cannot tell, because from inside the basin every direction looks worse. In PSO this shows up as particle velocities decaying to zero around a local optimum; in ACO it shows up as one trail saturating the pheromone matrix so completely that alternative edges are never sampled again. The system is stuck not because it ran out of compute but because it ran out of diversity.
The structural diagnosis is identical to the cascade: a feedback loop has consumed the variance that the search needed to keep working. This is why the standard mitigations for premature convergence are all diversity-preservation mechanisms. ACO uses pheromone evaporation, a slow decay that erases stale reinforcement so old trails do not dominate forever; modified versions impose minimum and maximum pheromone bounds (the MAX-MIN ant system) so no edge can ever reach probability one or fall to zero. PSO injects mutation, restarts stagnant particles, or maintains explicit sub-swarms that are forbidden from collapsing into each other. Each of these is, mechanically, a negative-feedback term added to a positive-feedback system: a force that pushes diversity back up whenever it falls too far. The art of designing a robust swarm is balancing the two so the system converges fast enough to be useful but never so completely that it goes blind.
There is a recognizable human version of premature convergence: the meeting where the first plausible suggestion becomes the plan, not because it was compared against alternatives but because comparing would have meant someone dissenting, and dissent felt expensive. The pheromone here is social, not chemical, and the evaporation term is the colleague brave enough to say "before we lock this in, has anyone got a different one?" Swarms without an evaporation term and teams without that colleague fail the same way, with great confidence and identical velocity vectors.
3. Oscillation, Deadlock, and Livelock: When the Loop Never Settles Advanced
Not every collective failure ends in a frozen wrong answer; some never settle at all. Feedback loops with the wrong gain or the wrong delay oscillate. Imagine a fleet of load-balancing agents that each route requests toward the least-loaded server they can see, using load information that is a few seconds stale. They all observe the same idle server, all stampede toward it, overload it, then all observe it is now the busiest and stampede away, leaving it idle again. Nothing is broken, every agent is following a sensible local rule, and yet the global load swings forever because the feedback is positive over the staleness delay. This is the same instability that flocking (Section 31.5) must engineer against: a flock that over-corrects to its neighbors' velocity does not glide, it shudders. The cure is damping, a negative-feedback term proportional to the rate of change, plus randomization to break the synchrony that lets every agent move in lockstep.
Deadlock and livelock are the discrete cousins of oscillation. In deadlock, agents mutually block and nothing moves: two robots in a narrow corridor each waiting for the other to back up, or a ring of agents each holding the resource its neighbor needs. In livelock, agents keep moving but make no progress: the two robots both step aside, both into the same direction, collide again, both step aside again, politely and forever. A robot swarm crossing a doorway can jam exactly like cars at a four-way stop with no agreed precedence, an emergent gridlock that no robot intended and that gets worse, not better, as you add robots. The mitigations are the classics of distributed systems applied to physical agents: randomized back-off so two blocked agents do not retry in perfect sync, priority or token schemes that break symmetry, and timeouts that force a stuck agent to abandon its plan and re-randomize. The connection to the decentralized coordination of Section 31.8 is direct: coordination without a central controller buys robustness, but it also removes the referee that could have broken the tie, so the tie-breaking must be designed into the local rules themselves.
4. Emergent Undesired Behavior: Flash Crashes and the Inverse Problem Gone Dangerous Advanced
The most unsettling collective failures are the ones no designer put in and no designer can easily reproduce. Section 31.8 posed the inverse problem of swarm design as a benign puzzle: given a desired global behavior, find local rules that produce it. Run that puzzle in reverse and it turns dangerous, because a set of local rules chosen for one global behavior can also produce a completely different, unintended one under conditions the designer never tested. Markets of automated trading agents are the canonical example. Each agent follows a sensible local rule (sell when price falls past a threshold, buy when it rises), and in normal conditions the population is stable. Under a particular shock, those same thresholds chain: one agent's sell pushes the price past a second agent's threshold, whose sell pushes it past a third's, and the price collapses in seconds with no fundamental cause, a flash crash that emerges entirely from the coupling between rules that were individually prudent. The 2010 equity flash crash and numerous smaller ones since are this dynamic, and they are genuinely hard to debug because the bug is not in any agent; it is in the interaction, which appears only at population scale under conditions that may never recur identically.
This is the general lesson and the reason this section exists. The competence of an emergent system and its capacity for catastrophe come from the same place: behavior that is not specified in any agent but produced by their interaction. You cannot unit-test an emergent failure, because it is not a property of any unit. You cannot always reproduce it on demand, because it depends on a configuration of the whole population. And you cannot assume it away, because the very feature that makes the swarm useful, that the global behavior exceeds the sum of the local rules, is what makes the global behavior able to exceed the designer's intentions. Emergent collective behavior must be treated as a system property to be monitored at the collective level, not a sum of agent properties that individual testing can certify safe.
Who: A reliability and trust team at a video platform whose ranking is driven by a fleet of agents optimizing watch-time.
Situation: Each ranking agent followed a simple local rule, promote content similar to what a user engaged with, and the fleet jointly produced each user's feed.
Problem: For a slice of users, recommendations narrowed and intensified over weeks toward extreme content, a collective drift no single agent's rule described and no engineer had designed.
Dilemma: Treat it as a per-agent bug and tune individual ranking rules (which testing showed were each behaving as specified), or treat it as an emergent population-level dynamic requiring a collective-level intervention.
Decision: They diagnosed it as a positive-feedback cascade, engagement reinforcing similarity reinforcing engagement, structurally the pheromone loop of Section 31.3 with no evaporation term, and intervened at the collective level.
How: They added a diversity-injection term (randomized exploration into the recommendation slate), a damping cap on how fast any topic could come to dominate a feed, and a monitor that watched the population-level concentration metric, not per-agent behavior.
Result: The runaway narrowing flattened, measured by a drop in feed-concentration variance, without a measurable cost to overall engagement, because the diversity term restored the exploration the loop had consumed.
Lesson: The failure was invisible at the agent level and obvious at the collective level. The fixes were the swarm fixes of this section, evaporation, damping, and collective monitoring, applied to an agent fleet that nobody had thought of as a swarm.
5. Robustness Paradoxes: Fragile to Correlation and Malice Advanced
Swarms are celebrated for robustness, and the praise is earned but narrow. A swarm tolerates random, independent failures beautifully: lose a tenth of the ants, the colony still finds the path; lose a few drones from a flock, the flock reforms. This is because the global behavior is a statistical aggregate over many agents, and removing a random subset barely shifts the aggregate. The paradox is that the very property granting this robustness, that no single agent matters because the collective averages over all of them, also describes the system's fragility: if failures are not independent but correlated, or if some agents are not failing but actively lying, the average itself moves, and the swarm follows it confidently off the cliff. A swarm is robust to random noise and fragile to structured attack, and these are not opposite claims; they are the same claim about averages, read in two directions.
Correlated failure is the benign version. If a shared sensor model, a common software bug, or a single upstream data feed makes many agents wrong in the same direction at the same time, the aggregate is biased rather than merely noisy, and no amount of swarm size averages out a bias. The malicious version is sharper. A handful of Byzantine agents that report plausible but false signals can steer a consensus or a collective estimate, because the honest agents, designed to trust the aggregate, fold the lies into it. Collective perception (Section 31.6) and the consensus of Section 31.5 are exactly the targets: a few agents reporting a fabricated observation can shift a shared estimate the whole swarm then acts on. The defenses are the subject of Chapter 35 on Byzantine-robust aggregation, where the plain average is replaced by a robust statistic (a coordinate-wise median, a trimmed mean, Krum) that a minority of adversaries cannot drag, and they connect directly to the trust and reputation mechanisms of Chapter 29 that let honest agents down-weight the ones that have lied before. The swarm-design lesson is that robustness to random failure is free and robustness to adversarial failure is not; the latter must be engineered in, by replacing every "trust the average" with "trust a statistic the adversary cannot move."
As collectives of large-language-model agents move from demos to deployment, their failure modes have become an active research area, and they rhyme exactly with the classical swarm failures of this section. Studies of multi-agent LLM systems (the MAST taxonomy and related 2024-2025 analyses of why multi-agent agent systems fail) document herding among agents that defer to a confident peer rather than their own reasoning, premature convergence when a debate collapses to early agreement, and oscillation when agents loop without terminating, the cascade, the stagnation, and the livelock of this section in linguistic dress. A parallel line studies adversarial and correlated failure: a single compromised or prompt-injected agent steering a multi-agent consensus, and error propagation where one agent's hallucination is cited as fact by the rest, the Byzantine-steering of Section 5 with natural language as the attack surface. The proposed defenses are recognizably the swarm mitigations, enforced diversity of agent roles and prompts, debate and self-consistency to aggregate independent reasoning rather than copy, damping via verifier or critic agents, and circuit-breaker monitors that halt a runaway agent loop. Chapter 32 builds the orchestration where these defenses live; the point here is that emergent failure did not disappear when the agents learned to talk, it changed costume.
6. Mitigations: Monitor, Diversify, Damp, and Break the Circuit Intermediate
Every failure in this section has the same shape, a feedback loop consuming the diversity or stability the collective needed, so the mitigations form one short, repeated toolkit. The first is to maintain diversity deliberately: pheromone evaporation and bounds in ACO, mutation and sub-swarms in PSO, randomized exploration in an agent fleet, independent signals in a crowd. Diversity is the variance that lets a collective keep learning, and every cure above is at bottom a way to stop a positive-feedback loop from spending it all. The second is negative feedback and damping: a term proportional to the rate of change that opposes runaway amplification and over-correction, turning oscillation into convergence and a stampede into a smooth flow. The third is randomization, which breaks the synchrony and symmetry behind lockstep oscillation, deadlock, and livelock; randomized back-off and randomized tie-breaking are the swarm versions of the same idea. The fourth is monitoring the collective at the collective level, watching population-scale metrics (diversity, concentration, consensus spread, oscillation amplitude) that no single agent can see, which is the distributed-monitoring discipline of Chapter 26 pointed at emergent behavior rather than at request latency.
The fifth mitigation is the safety net the others lean on: the circuit breaker. A circuit breaker is a monitor with the authority to act, a rule that halts or throttles the collective when a population-level signal crosses a threshold, before the cascade completes. Stock exchanges literally call them circuit breakers: trading halts automatically when the index falls too fast, breaking the flash-crash feedback loop by removing the agents' ability to keep selling. The pattern generalizes to any swarm: a kill-switch on a runaway recommendation loop, a cap on how fast any pheromone trail or any agent can dominate, a monitor that pauses a multi-agent debate that has stopped terminating. The circuit breaker concedes the deep lesson of this section, that you cannot always prevent an emergent failure, so you must be able to detect and interrupt it. None of these five is exotic; together they are the difference between a collective system that is safe because its emergent behavior is watched and bounded, and one that is merely assumed safe because nothing has gone wrong yet.
This chapter's place on the book's spine is the most decentralized form of distribution: no coordinator, only local rules, stigmergy, feedback, and consensus. Section 31.8 cashed in the upside, scalability and robustness to random failure that a centralized design cannot match. This section pays the bill. Removing the central coordinator removes the central referee, monitor, and kill-switch, so every safeguard a centralized system gets for free, tie-breaking, global oversight, an authoritative halt, must be re-created from local rules and collective-level monitors. The distributed-systems tax this book has tracked since Section 1.1, communication and failure, takes its final and subtlest form here: in a swarm the failure tax includes failures that exist only at the level of the collective, and the only place to pay it is at that same level.
The collective-level monitor-and-halt pattern is exactly the circuit breaker that resilience libraries provide for distributed services, so you wire a swarm safeguard from a battle-tested component instead of writing the state machine yourself. The PyBreaker library (and its equivalents like resilience4j on the JVM) gives a breaker that trips after a threshold of bad signals, blocks calls while open, and probes for recovery, the open / closed / half-open lifecycle you would otherwise reimplement:
# pip install pybreaker
import pybreaker
# Trip after 5 anomalous collective readings; stay open 60s, then probe.
swarm_breaker = pybreaker.CircuitBreaker(fail_max=5, reset_timeout=60)
@swarm_breaker
def apply_swarm_action(action):
if collective_anomaly_score() > THRESHOLD: # population-level monitor
raise RuntimeError("emergent anomaly: divergence / oscillation / cascade")
return execute(action) # otherwise act normally
# Once tripped, the breaker halts the loop automatically and a human (or a
# fallback policy) is paged, the flash-crash trading halt as a decorator.
7. Chapter Summary and the Handoff to Orchestrated Agents Beginner
This section closes Chapter 31, the chapter on the most decentralized and, when it works, the most robust and scalable form of coordination in the book. We began with collective intelligence and swarm intelligence (Sections 31.1 and 31.2), the idea that many simple agents following local rules can produce global competence no individual possesses. We turned that idea into optimization with ant colony optimization (Section 31.3) and particle swarm optimization (Section 31.4), where stigmergic pheromone trails and social attraction search a space no single agent could. We turned it into coordination and sensing with flocking and distributed consensus (Section 31.5) and collective perception (Section 31.6), and into language and self-organization with emergent communication (Section 31.7) and coordination without central control (Section 31.8). The thread through all of it was emergence: local rules, feedback, stigmergy, and consensus producing global behavior. This final section completed the picture by showing emergence has no built-in sign. The same loops that compute can also herd, stagnate, oscillate, deadlock, and crash, and because these failures live in the coupling between agents rather than inside any one of them, they must be monitored and safeguarded at the collective level, never assumed benign.
Swarm intelligence is distribution taken to its limit: no central coordinator, only local rules, stigmergy, feedback, and consensus, which buys unmatched scalability and robustness to random failure. Ant colony and particle swarm optimization (Sections 31.3 and 31.4) turn those local rules into global search; flocking, consensus, and collective perception (Sections 31.5 and 31.6) turn them into coordinated motion and shared sensing; emergent communication and leaderless coordination (Sections 31.7 and 31.8) turn them into self-organized protocols. The catch is that emergence is hard to design forward (the inverse problem) and can fail collectively in ways no agent intended: herding cascades, premature convergence, oscillation, deadlock, and flash-crash dynamics, all of which are positive feedback consuming diversity or stability. The cure is one toolkit applied at the collective level: maintain diversity, add negative-feedback damping, randomize to break symmetry, monitor population-scale metrics, and install circuit breakers that can halt a runaway loop. Emergent collective behavior must be watched and bounded, not trusted by default.
The next chapter changes the kind of agent without changing the lesson. Chapter 31's agents were deliberately minimal, an ant, a particle, a boid, carrying almost no internal state. Chapter 32 replaces them with large-language-model agents that plan, use tools, and reason in natural language, and orchestrates them into distributed systems that solve tasks no single agent can. Everything in this section survives the upgrade. A fleet of LLM agents is still a collective; it still herds when agents defer to a confident peer, still converges prematurely when a debate ends too soon, still oscillates when agents loop without halting, and is still steerable by one compromised or prompt-injected member. The orchestration patterns of Chapter 32, planner-executor roles, verifier and critic agents, debate and self-consistency, are the diversity, damping, monitoring, and circuit-breaking of this section, rebuilt for agents that talk. You leave Chapter 31 knowing how local rules become global competence and global catastrophe; you enter Chapter 32 to learn how to orchestrate the former while engineering against the latter.
Using the Bayesian model of Section 1, work through the first three agents by hand for private-signal accuracy $p = 0.7$. Agent 1 receives a signal and acts on it. Suppose Agent 1 adopts and Agent 2 also receives an "adopt" signal; what does Agent 2 do, and is it now informative? Now suppose Agent 1 adopts but Agent 2 receives a "reject" signal: show that Agent 2 is indifferent and (by the standard tie-breaking rule) follows its own signal, but that Agent 3, seeing one adopt and one reject, is back to relying on its own signal. Identify the exact public tally at which the cascade locks and explain why, from that point on, the public record stops gaining information no matter how many agents follow.
Extend Code 31.9.1 with a parameter $\rho \in [0,1]$, the probability that any given agent ignores the public tally and acts on its own private signal (so $\rho = 0$ is pure herding and $\rho = 1$ is pure diversity). Sweep $\rho$ from $0$ to $1$ in steps of $0.1$ and plot or print collective accuracy versus $\rho$. Find the smallest $\rho$ at which collective accuracy rises clearly above the single-agent baseline of $0.600$. Then vary the private accuracy $p$ and the agent count $N$: does a larger crowd help under herding, and does it help under diversity? Explain your two answers using the Condorcet jury theorem and the cascade-lock argument.
A collective-perception swarm of $n$ agents estimates a shared scalar by averaging each agent's reading. An adversary controls $f$ of the agents and makes each report an arbitrary value to drag the mean. (a) Derive how far the adversary can move the swarm's averaged estimate as a function of $f$, $n$, and the range of legal readings, and show it grows without bound in the reading magnitude even for small $f/n$. (b) Now replace the mean with the coordinate-wise median; argue why the adversary needs $f \ge n/2$ to move the estimate arbitrarily, and state the price the honest agents pay in efficiency for that robustness. (c) Connect your result to the Byzantine-robust aggregation of Chapter 35: which "trust the average" steps elsewhere in this chapter (Section 31.5 consensus, Section 31.6 collective perception) would you replace, and with what?
These close the chapter and are sized so that one, carried through, becomes a substantial project or a seed for the capstone (Chapter 41).
1. A swarm-robotics simulation with built-in failure modes. Build a 2D simulation of $50$ to $200$ agents doing flocking (Section 31.5) and a shared task such as collective foraging or doorway crossing. Then deliberately induce each failure from this section: tune the velocity-matching gain until the flock oscillates, narrow a corridor until the swarm deadlocks, and remove pheromone evaporation until foraging stagnates on a suboptimal source. For each, implement one mitigation (damping, randomized back-off, evaporation) and measure the recovery. Deliver a dashboard of population-level metrics (diversity, oscillation amplitude, throughput) so the failures and cures are visible at the collective level, not just animated.
2. Reproduce an information cascade, then defeat it. Start from Code 31.9.1 and grow it into a study: reproduce the cascade across a grid of private-accuracy $p$, crowd size $N$, and a network topology where agents see only some predecessors (not a perfect public tally). Quantify how often the crowd locks onto the wrong answer, then implement and compare at least three mitigations, mandatory independent action with probability $\rho$, a minority of contrarian agents, and a "reveal your private signal" protocol, and report which buys the most accuracy per unit of imposed diversity. This is a publishable-shaped small experiment on the wisdom-versus-herding boundary.
3. A circuit breaker for an agent fleet. Take a small multi-agent system (even a simulated market of threshold-trading agents, or a fleet of recommendation agents on synthetic users) and instrument it with a collective-level monitor and a circuit breaker (Code 31.9.2). Engineer a shock that triggers a flash-crash-style cascade, then show the breaker detecting the population-level anomaly and halting the loop before the cascade completes. Measure the false-positive rate (how often the breaker trips on benign volatility) against the loss it prevents, the core engineering trade-off of any safety net, and forward-link your design to the orchestration safeguards of Chapter 32.