A single agent with three tools is a demo. The real problems start when you try to make agents work together: one burns through its context chasing a task it should have delegated. Another hallucinates when it runs out of working memory. A third gets stuck in a retry loop while the others wait. Your carefully designed system cascades into failure — and you learn more from the collapse than from anything that worked.
This course makes those failures visible, playable, and fun. You build a game whose mechanics ARE agent architecture. The health bar IS context management. The party system IS multi-agent delegation. The encounter system IS task classification. The save system IS persistence. When your colony of agents spirals into chaos, you can see exactly why — because the game renders every architectural decision as something you can watch, manage, and learn from.
You are The Operator. Your agents explore worlds generated from real codebases. You direct them, configure them, and intervene when things go wrong. When things go very wrong, the run ends and you start again with better architecture.
What you'll build
A colony management game that is also, by the end, a deployable multi-agent system:
-
A world engine built on just-bash. Your codebase becomes the world — directories are rooms, files are artifacts,
node_modules/is a maze of twisty little passages, all alike. World generation scans for bugs (TODOs, type errors), notable files, and history (git log). Different filesystem backends map to different environment types: virtual (safe, disposable), copy-on-write (explore real projects without risk), and real execution (Firecracker microVMs via Vercel Sandbox). -
Agents with roles and resources. Each agent has a role (scout, executor, critic, coordinator), an ability loadout (tools), a context budget (health bar), a token budget (energy), a personality (system prompt), and performance that degrades under pressure. Powered by AI SDK —
generateTextwith tools, structured output for world state, streaming for real-time updates. -
An ability system where AI SDK tools are game abilities. Search, read, write, and execute are wrapped as game mechanics with themed output. Custom commands via
defineCommand()add game-specific abilities. Tool descriptions are the model selection API — the same lesson the harness course teaches, rendered as ability tooltips. -
An encounter system where tasks arrive with visible stats: difficulty, type, required abilities, and reward. Encounters fan out to agent consumer groups via Vercel Queues. Deduplication prevents duplicate work. Delayed delivery schedules future encounters.
-
Error handling as combat. API timeouts are damage. Cascade failures are poison. Recovery abilities: retry (immediate heal), exponential backoff (rest), circuit breaker (shield), human escalation (expensive but reliable). Agents that can't handle errors fail at the first real encounter.
-
Context and cost management as resource management. The context budget depletes with every action — a visible bar with green/yellow/red zones. When it runs out, the agent loses coherence. The token budget is spent on abilities — cheap models for scouting, strong models for execution. Model routing through AI Gateway IS resource conservation.
-
Multi-agent delegation as party composition. 🔍 Scout (cheap model, read-only, fast). ⚔️ Executor (strong model, full tools, expensive). 🛡️ Critic (verification, testing). 👑 Coordinator (plans, delegates, aggregates). Party formation = choosing which subagents to spawn. Inter-agent communication via Queues.
-
Human-in-the-loop as dialogue. The agent encounters ambiguity → pauses → asks The Operator via Vercel Workflow hooks. Multiple choice options. Timed responses. Agents that ask good questions waste less time than agents that guess.
-
Skill composition as crafting. Forge composite abilities from primitives. The tech tree IS extensibility. Progressive disclosure — abilities unlock as the agent levels up.
-
Persistence as world state. Convex stores agent memories, colony state, and the decision record. Sandbox snapshots are save states. Log out, come back, the world remembers.
-
Deployment as the endgame. Deploy the colony on Vercel. Next.js dashboard as the operator console. Workflow orchestrates durable task processing. Queues stream events between agents. Sandbox provides isolated execution. The colony runs autonomously.
The companion skill
The course ships with a companion skill:
npx skills add vercel-labs/academy-skills --skill=the-claw -yYour coding agent reads the skill and starts playing. It begins knowing only how to explore. Each section gives it new capabilities — tools, encounter handling, delegation, persistence — until it can operate independently.
Lesson content lives on Academy as MDX, served via .md endpoints. The agent fetches lessons, absorbs patterns, and gains abilities. You read along in the browser for diagrams, theory, and deeper context. Two audiences, same content.
Prerequisites
- TypeScript, async/await, basic terminal experience
- An
AI_GATEWAY_API_KEYenvironment variable - Node.js 20+ or Bun runtime
- A coding agent (Cursor, Codex, pi, Claude Code, or any agent that reads skills)
- Recommended: Building Filesystem Agents course
How the course works
Causal sequence. Each section exists because the previous one broke something. Section 1 spawns an agent that can explore but can't act. Section 2 gives it tools because it found problems it couldn't fix. Section 3 adds encounters because tools without purpose is chaos. The game generates the motivation for the next mechanic.
Coding agent amplified. You direct a coding agent to build the game. The agent writes the code. You make the design decisions. Ambition per section is calibrated to decision-making, not typing.
Whole-task from day 1. From the first lesson, you have a running game with an agent exploring your codebase. Every section adds a mechanic while the game stays playable. Scaffolding fades — early sections model heavily, later sections give requirements only.
Losing is fun. Agent failures are the most instructive part of the course. Context overflow, hallucination, cascade errors — the game renders these visibly so you can diagnose them. The question isn't "will my agents succeed?" but "what will break, and how will I build resilience?"
The game engine
just-bash is the game engine. The world is a just-bash instance. World generation is file creation. Agent actions are bash commands.
Filesystem backends = environment types
| Backend | Environment | Properties |
|---|---|---|
InMemoryFs | Training | Pure virtual. Safe and disposable. |
OverlayFs | Shadow copy | Copy-on-write over real codebase. Reads from disk, writes stay in memory. |
ReadWriteFs | Real execution | Real filesystem. Late-game only. Real consequences. |
MountableFs | Multi-mount | Different FS types at different paths. Read-only knowledge + read-write workspace. |
File values can be functions — called on first read, cached after. Unvisited rooms don't generate content until an agent enters. Fog of war at the engine level.
Codebases aren't trees — they're graphs. Imports, cross-file references, and dependencies create threads: non-adjacent connections between distant rooms. A reference in one file that points to another is a marker. Threads can be tangled (circular dependencies), severed (dead imports), or dense (high coupling). The web of threads IS the dependency graph, discoverable by searching.
Found markers are already in the code — imports, cross-references, comments pointing elsewhere. Placed markers are the agent's own bookmarks — locations saved for quick return. An agent that learns to place markers stops wasting context re-navigating the same paths.
80+ built-in commands
Exploration: ls, tree, cat, head, tail, find, stat, file
Search: grep, rg, sed, awk, jq, yq, xan, diff
Data: sqlite3, python3, sort, uniq, wc
Construction: mkdir, cp, mv, ln, touch, tee, tar, gzip
Network: curl with URL allowlists, html-to-markdown
Custom commands
defineCommand() adds TypeScript commands to the shell. Each custom command is a game mechanic that maps to an agent architecture pattern.
Execution limits
maxCallDepth, maxCommandCount, maxLoopIterations — lower limits = harder difficulty. Infinite loops auto-caught.
Environment progression
just-bash's Sandbox class is API-compatible with @vercel/sandbox. Start virtual. Graduate to Firecracker microVMs by swapping one import. Same agent, same abilities, different environment.
The causal chain
The section structure emerges from building. We build, discover what breaks, and add the mechanic that fixes it.
What's locked:
- First lesson: Install the skill. Agent spawns. World generates. You're playing.
- Last section: Capstone — "Your agents are ready. Point them at something real."
- The principle: Each section adds a game mechanic because the previous state broke. Causal, not thematic.
The pattern: you build something, it works, then it doesn't. The agent can explore but can't act — it needs tools. The agent has tools but acts randomly — it needs encounters with objectives. Encounters fail with no recovery — it needs error handling. Each friction point IS the motivation for the next game system IS the motivation for the next architecture pattern.
Each section uses scaffolding that fades: modeling (watch an agent handle it) → imitation (do what it did) → completion (fill in the gap) → independent (new encounters yourself).
Agent anatomy
Every agent has:
| Attribute | Game mechanic | Architecture pattern |
|---|---|---|
| Role | Scout, executor, critic, coordinator | Subagent specialization |
| Abilities | Equipped tools | Tool registry per agent |
| Context budget | Health bar — depletes with every action | Context window usage |
| Token budget | Energy for powerful abilities | API cost management |
| Personality | System prompt | Prompt engineering |
| Performance | Healthy → stressed → degraded → failing | Error accumulation |
| Relationships | Trust levels, delegation preferences | Inter-agent coordination |
| Memory | History, learned patterns, records | Context, session state, long-term storage |
Agents act autonomously. The Operator observes, configures, and intervenes. The gap between what you asked an agent to do and what it actually does IS the game.
Multi-model routing
Different abilities route to different models via AI Gateway:
| Ability tier | Model class | Cost | Use |
|---|---|---|---|
| Scout/explore | Small, fast | Low | Descriptions, navigation, quick search |
| Execute/reason | Large, strong | High | Complex tasks, code analysis, planning |
| Generate visuals | Image model | Medium | Portraits, illustrations, diagrams |
The learner learns multi-model routing because the game economy demands it. Using the strongest model for everything drains resources before the hard encounters arrive. The AI Gateway model catalog is the living reference — the course teaches the routing pattern, not specific model names.
The world grows
The game world IS the system the learner is building. It starts as a filesystem and grows into a distributed system:
Early: A directory tree. Rooms connected by paths. Agents walk through the local filesystem.
Mid-course: Portals appear — API endpoints connecting to external services. A shared database every agent can access. Message queues carry events between rooms. The world is a network.
Late course: The world IS a distributed system. Rooms across multiple environments. Portals that sometimes close (network failures). Queues that back up (congestion). Databases that rate-limit (resource exhaustion). The game mechanics for handling this ARE distributed systems patterns.
Deployed: The agents aren't navigating a representation of a distributed system. They ARE the distributed system. The game was the architecture the whole time.
Course sections
Sections are built iteratively. This list grows as the game grows.
Section 1: First Run
Set up the game engine. Generate a world from your codebase. Your agent explores.
- Your First Agent — Install the skill, generate the world, start exploring
What you walk away with
-
A deployed multi-agent system — running on Vercel, world state in Convex, operator dashboard live, agents running autonomously. A game that is also, if you choose, a system that does real work.
-
Agent architecture knowledge — absorbed through play. Tool loops, context management, delegation, error handling, persistence, event-driven systems. Learned because you needed it to keep your agents running.
-
Vercel platform fluency — Workflow, Sandbox, Queues, Convex, AI SDK, AI Gateway, just-bash, bash-tool, deployment. Learned because the game required each primitive.
-
A mental model — "my agent system IS a colony" makes architecture decisions intuitive. When someone asks "how should I handle agent coordination?", you think in party composition, not in abstract diagrams.
-
Stories — emergent narratives from your runs. The scout that found a circular dependency. The executor that cascaded. The failure that took the whole colony down. If you're telling stories about your agents, the course worked.
Tech stack
| Component | Purpose | Game role |
|---|---|---|
| AI SDK | Agent loops, tools, structured output, streaming | Agent brains |
| AI Gateway | Model routing | Ability tiers, resource management |
| just-bash | Virtual sandbox, game engine | World construction, abilities, execution limits |
| bash-tool | AI SDK tool wrapper | Agent ability interface |
| Vercel Sandbox | Firecracker microVMs | Real execution environment |
| Vercel Workflow | Durable multi-step execution | Task workflows, human-in-the-loop hooks |
| Vercel Queues | Durable event streaming | Task fan-out, inter-agent signals |
| Convex | Reactive database | World state, agent memories, decision records |
| Next.js | Web surface | Operator dashboard |
| Zod | Schema validation | Task contracts, structured output |
Relationship to other courses
| Course | Relationship |
|---|---|
| Build Your Own AI Agent Harness | Same architecture patterns, different frame. The harness course teaches through engineering. This course teaches through game design. Same concepts from multiple directions = deeper learning. |
| Building Filesystem Agents | Entry point. Filesystem Agents teaches the basics. This course goes deeper — multi-agent, persistence, deployment. |
| AI SDK | Foundation. This course uses AI SDK but teaches at the architecture pattern layer. |
Design approach
Three frameworks govern every design decision:
Raph Koster (A Theory of Fun) — Fun = mastering patterns in a safe space. Agent architecture patterns ARE the patterns the brain absorbs. The game is the safe space for experimentation.
Jesse Schell (The Art of Game Design) — The Elemental Tetrad: AI SDK (technology) + game mechanics (agent patterns) + colony narrative (story) + terminal aesthetic. LLM non-determinism IS the Chance mechanic. Token budgets ARE the Economy.
Van Merriënboer (Ten Steps to Complex Learning) — Whole-task from day 1. Scaffolding fades. Supportive information before tasks. Procedural information just-in-time. Sequence driven by what breaks.
The design lens is Dwarf Fortress. Colony management with autonomous actors. The operator manages systems, not individual agents. Emergent narrative from simple rules. Losing is the best teacher.
The course shouldn't feel like "learning agent architecture" any more than chess feels like "learning spatial reasoning." The patterns are absorbed through play.
On this page