Vercel Logo

Build AI Agents Through Game Design

Build a colony management game where autonomous AI agents explore worlds generated from real codebases. Every game mechanic is an agent architecture pattern — tool use, context management, multi-agent delegation, error handling, persistence, and deployment. Built with AI SDK, just-bash, Vercel Workflow, Queues, and Sandbox.

A single agent with three tools is a demo. The real problems start when you try to make agents work together: one burns through its context chasing a task it should have delegated. Another hallucinates when it runs out of working memory. A third gets stuck in a retry loop while the others wait. Your carefully designed system cascades into failure — and you learn more from the collapse than from anything that worked.

This course makes those failures visible, playable, and fun. You build a game whose mechanics ARE agent architecture. The health bar IS context management. The party system IS multi-agent delegation. The encounter system IS task classification. The save system IS persistence. When your colony of agents spirals into chaos, you can see exactly why — because the game renders every architectural decision as something you can watch, manage, and learn from.

You are The Operator. Your agents explore worlds generated from real codebases. You direct them, configure them, and intervene when things go wrong. When things go very wrong, the run ends and you start again with better architecture.


What you'll build

A colony management game that is also, by the end, a deployable multi-agent system:

  • A world engine built on just-bash. Your codebase becomes the world — directories are rooms, files are artifacts, node_modules/ is a maze of twisty little passages, all alike. World generation scans for bugs (TODOs, type errors), notable files, and history (git log). Different filesystem backends map to different environment types: virtual (safe, disposable), copy-on-write (explore real projects without risk), and real execution (Firecracker microVMs via Vercel Sandbox).

  • Agents with roles and resources. Each agent has a role (scout, executor, critic, coordinator), an ability loadout (tools), a context budget (health bar), a token budget (energy), a personality (system prompt), and performance that degrades under pressure. Powered by AI SDKgenerateText with tools, structured output for world state, streaming for real-time updates.

  • An ability system where AI SDK tools are game abilities. Search, read, write, and execute are wrapped as game mechanics with themed output. Custom commands via defineCommand() add game-specific abilities. Tool descriptions are the model selection API — the same lesson the harness course teaches, rendered as ability tooltips.

  • An encounter system where tasks arrive with visible stats: difficulty, type, required abilities, and reward. Encounters fan out to agent consumer groups via Vercel Queues. Deduplication prevents duplicate work. Delayed delivery schedules future encounters.

  • Error handling as combat. API timeouts are damage. Cascade failures are poison. Recovery abilities: retry (immediate heal), exponential backoff (rest), circuit breaker (shield), human escalation (expensive but reliable). Agents that can't handle errors fail at the first real encounter.

  • Context and cost management as resource management. The context budget depletes with every action — a visible bar with green/yellow/red zones. When it runs out, the agent loses coherence. The token budget is spent on abilities — cheap models for scouting, strong models for execution. Model routing through AI Gateway IS resource conservation.

  • Multi-agent delegation as party composition. 🔍 Scout (cheap model, read-only, fast). ⚔️ Executor (strong model, full tools, expensive). 🛡️ Critic (verification, testing). 👑 Coordinator (plans, delegates, aggregates). Party formation = choosing which subagents to spawn. Inter-agent communication via Queues.

  • Human-in-the-loop as dialogue. The agent encounters ambiguity → pauses → asks The Operator via Vercel Workflow hooks. Multiple choice options. Timed responses. Agents that ask good questions waste less time than agents that guess.

  • Skill composition as crafting. Forge composite abilities from primitives. The tech tree IS extensibility. Progressive disclosure — abilities unlock as the agent levels up.

  • Persistence as world state. Convex stores agent memories, colony state, and the decision record. Sandbox snapshots are save states. Log out, come back, the world remembers.

  • Deployment as the endgame. Deploy the colony on Vercel. Next.js dashboard as the operator console. Workflow orchestrates durable task processing. Queues stream events between agents. Sandbox provides isolated execution. The colony runs autonomously.

The companion skill

The course ships with a companion skill:

npx skills add vercel-labs/academy-skills --skill=the-claw -y

Your coding agent reads the skill and starts playing. It begins knowing only how to explore. Each section gives it new capabilities — tools, encounter handling, delegation, persistence — until it can operate independently.

Lesson content lives on Academy as MDX, served via .md endpoints. The agent fetches lessons, absorbs patterns, and gains abilities. You read along in the browser for diagrams, theory, and deeper context. Two audiences, same content.

Prerequisites

  • TypeScript, async/await, basic terminal experience
  • An AI_GATEWAY_API_KEY environment variable
  • Node.js 20+ or Bun runtime
  • A coding agent (Cursor, Codex, pi, Claude Code, or any agent that reads skills)
  • Recommended: Building Filesystem Agents course

How the course works

Causal sequence. Each section exists because the previous one broke something. Section 1 spawns an agent that can explore but can't act. Section 2 gives it tools because it found problems it couldn't fix. Section 3 adds encounters because tools without purpose is chaos. The game generates the motivation for the next mechanic.

Coding agent amplified. You direct a coding agent to build the game. The agent writes the code. You make the design decisions. Ambition per section is calibrated to decision-making, not typing.

Whole-task from day 1. From the first lesson, you have a running game with an agent exploring your codebase. Every section adds a mechanic while the game stays playable. Scaffolding fades — early sections model heavily, later sections give requirements only.

Losing is fun. Agent failures are the most instructive part of the course. Context overflow, hallucination, cascade errors — the game renders these visibly so you can diagnose them. The question isn't "will my agents succeed?" but "what will break, and how will I build resilience?"


The game engine

just-bash is the game engine. The world is a just-bash instance. World generation is file creation. Agent actions are bash commands.

Filesystem backends = environment types

BackendEnvironmentProperties
InMemoryFsTrainingPure virtual. Safe and disposable.
OverlayFsShadow copyCopy-on-write over real codebase. Reads from disk, writes stay in memory.
ReadWriteFsReal executionReal filesystem. Late-game only. Real consequences.
MountableFsMulti-mountDifferent FS types at different paths. Read-only knowledge + read-write workspace.

File values can be functions — called on first read, cached after. Unvisited rooms don't generate content until an agent enters. Fog of war at the engine level.

Codebases aren't trees — they're graphs. Imports, cross-file references, and dependencies create threads: non-adjacent connections between distant rooms. A reference in one file that points to another is a marker. Threads can be tangled (circular dependencies), severed (dead imports), or dense (high coupling). The web of threads IS the dependency graph, discoverable by searching.

Found markers are already in the code — imports, cross-references, comments pointing elsewhere. Placed markers are the agent's own bookmarks — locations saved for quick return. An agent that learns to place markers stops wasting context re-navigating the same paths.

80+ built-in commands

Exploration: ls, tree, cat, head, tail, find, stat, file Search: grep, rg, sed, awk, jq, yq, xan, diff Data: sqlite3, python3, sort, uniq, wc Construction: mkdir, cp, mv, ln, touch, tee, tar, gzip Network: curl with URL allowlists, html-to-markdown

Custom commands

defineCommand() adds TypeScript commands to the shell. Each custom command is a game mechanic that maps to an agent architecture pattern.

Execution limits

maxCallDepth, maxCommandCount, maxLoopIterations — lower limits = harder difficulty. Infinite loops auto-caught.

Environment progression

just-bash's Sandbox class is API-compatible with @vercel/sandbox. Start virtual. Graduate to Firecracker microVMs by swapping one import. Same agent, same abilities, different environment.


The causal chain

The section structure emerges from building. We build, discover what breaks, and add the mechanic that fixes it.

What's locked:

  • First lesson: Install the skill. Agent spawns. World generates. You're playing.
  • Last section: Capstone — "Your agents are ready. Point them at something real."
  • The principle: Each section adds a game mechanic because the previous state broke. Causal, not thematic.

The pattern: you build something, it works, then it doesn't. The agent can explore but can't act — it needs tools. The agent has tools but acts randomly — it needs encounters with objectives. Encounters fail with no recovery — it needs error handling. Each friction point IS the motivation for the next game system IS the motivation for the next architecture pattern.

Each section uses scaffolding that fades: modeling (watch an agent handle it) → imitation (do what it did) → completion (fill in the gap) → independent (new encounters yourself).


Agent anatomy

Every agent has:

AttributeGame mechanicArchitecture pattern
RoleScout, executor, critic, coordinatorSubagent specialization
AbilitiesEquipped toolsTool registry per agent
Context budgetHealth bar — depletes with every actionContext window usage
Token budgetEnergy for powerful abilitiesAPI cost management
PersonalitySystem promptPrompt engineering
PerformanceHealthy → stressed → degraded → failingError accumulation
RelationshipsTrust levels, delegation preferencesInter-agent coordination
MemoryHistory, learned patterns, recordsContext, session state, long-term storage

Agents act autonomously. The Operator observes, configures, and intervenes. The gap between what you asked an agent to do and what it actually does IS the game.


Multi-model routing

Different abilities route to different models via AI Gateway:

Ability tierModel classCostUse
Scout/exploreSmall, fastLowDescriptions, navigation, quick search
Execute/reasonLarge, strongHighComplex tasks, code analysis, planning
Generate visualsImage modelMediumPortraits, illustrations, diagrams

The learner learns multi-model routing because the game economy demands it. Using the strongest model for everything drains resources before the hard encounters arrive. The AI Gateway model catalog is the living reference — the course teaches the routing pattern, not specific model names.


The world grows

The game world IS the system the learner is building. It starts as a filesystem and grows into a distributed system:

Early: A directory tree. Rooms connected by paths. Agents walk through the local filesystem.

Mid-course: Portals appear — API endpoints connecting to external services. A shared database every agent can access. Message queues carry events between rooms. The world is a network.

Late course: The world IS a distributed system. Rooms across multiple environments. Portals that sometimes close (network failures). Queues that back up (congestion). Databases that rate-limit (resource exhaustion). The game mechanics for handling this ARE distributed systems patterns.

Deployed: The agents aren't navigating a representation of a distributed system. They ARE the distributed system. The game was the architecture the whole time.


Course sections

Sections are built iteratively. This list grows as the game grows.

Section 1: First Run

Set up the game engine. Generate a world from your codebase. Your agent explores.


What you walk away with

  1. A deployed multi-agent system — running on Vercel, world state in Convex, operator dashboard live, agents running autonomously. A game that is also, if you choose, a system that does real work.

  2. Agent architecture knowledge — absorbed through play. Tool loops, context management, delegation, error handling, persistence, event-driven systems. Learned because you needed it to keep your agents running.

  3. Vercel platform fluency — Workflow, Sandbox, Queues, Convex, AI SDK, AI Gateway, just-bash, bash-tool, deployment. Learned because the game required each primitive.

  4. A mental model — "my agent system IS a colony" makes architecture decisions intuitive. When someone asks "how should I handle agent coordination?", you think in party composition, not in abstract diagrams.

  5. Stories — emergent narratives from your runs. The scout that found a circular dependency. The executor that cascaded. The failure that took the whole colony down. If you're telling stories about your agents, the course worked.


Tech stack

ComponentPurposeGame role
AI SDKAgent loops, tools, structured output, streamingAgent brains
AI GatewayModel routingAbility tiers, resource management
just-bashVirtual sandbox, game engineWorld construction, abilities, execution limits
bash-toolAI SDK tool wrapperAgent ability interface
Vercel SandboxFirecracker microVMsReal execution environment
Vercel WorkflowDurable multi-step executionTask workflows, human-in-the-loop hooks
Vercel QueuesDurable event streamingTask fan-out, inter-agent signals
ConvexReactive databaseWorld state, agent memories, decision records
Next.jsWeb surfaceOperator dashboard
ZodSchema validationTask contracts, structured output

Relationship to other courses

CourseRelationship
Build Your Own AI Agent HarnessSame architecture patterns, different frame. The harness course teaches through engineering. This course teaches through game design. Same concepts from multiple directions = deeper learning.
Building Filesystem AgentsEntry point. Filesystem Agents teaches the basics. This course goes deeper — multi-agent, persistence, deployment.
AI SDKFoundation. This course uses AI SDK but teaches at the architecture pattern layer.

Design approach

Three frameworks govern every design decision:

Raph Koster (A Theory of Fun) — Fun = mastering patterns in a safe space. Agent architecture patterns ARE the patterns the brain absorbs. The game is the safe space for experimentation.

Jesse Schell (The Art of Game Design) — The Elemental Tetrad: AI SDK (technology) + game mechanics (agent patterns) + colony narrative (story) + terminal aesthetic. LLM non-determinism IS the Chance mechanic. Token budgets ARE the Economy.

Van Merriënboer (Ten Steps to Complex Learning) — Whole-task from day 1. Scaffolding fades. Supportive information before tasks. Procedural information just-in-time. Sequence driven by what breaks.

The design lens is Dwarf Fortress. Colony management with autonomous actors. The operator manages systems, not individual agents. Emergent narrative from simple rules. Losing is the best teacher.

The course shouldn't feel like "learning agent architecture" any more than chess feels like "learning spatial reasoning." The patterns are absorbed through play.