Tools like Cursor, Copilot, and Claude Code all let you choose between models. Cursor even has an "Auto" mode and its own Composer model. Copilot supports Claude, GPT, and Gemini. But in all of them, you pick a model (or let the tool pick one for you) and that model runs everything in your session. There's no way to say "use Opus for planning, Haiku for file search, and Gemini for screenshots" and have that happen automatically.
That's what I wanted. Planning and reasoning need a strong model. File exploration doesn't. Screenshot analysis doesn't. Boilerplate generation doesn't. Running the same expensive model for all of these wastes tokens.
I use a stack that does this routing automatically. I configure which model handles which task once, and every agent gets the right model without me switching anything mid-session. The Vercel AI Gateway serves the models through a single endpoint, and oh-my-opencode handles the orchestration.
Here's how I set it up and how it works in practice.
This guide covers four tools that work together:
- OpenCode: An open-source AI coding agent that runs in your terminal. It's model-agnostic and supports plugins. Think of it as the foundation.
- oh-my-opencode: A plugin that turns OpenCode into a multi-agent system. It adds specialized agents (planner, researcher, debugger) and lets you assign different models to each one.
- Vercel AI Gateway: A single API endpoint that lets you call models from Anthropic, OpenAI, Google, and 40+ other providers with one API key. No separate accounts or billing per provider.
- agent-browser: A browser automation CLI by Vercel. Your agents use it to test web flows, take screenshots, and verify changes in a real browser.
The layers: OpenCode runs the session. oh-my-opencode adds orchestration. The AI Gateway routes model calls. agent-browser handles anything that needs a browser.
You'll need:
- A Vercel account (for AI Gateway access)
- Node.js 18+ and npm
- Homebrew (on macOS) or another way to install CLI tools
- A terminal you're comfortable in
OpenCode is open source, built by the team at Anomaly. I install it with Homebrew:
Verify it works with opencode --version. See the OpenCode docs for other install methods (npm, curl, etc.).
OpenCode stores its config at ~/.config/opencode/opencode.json by default. I rename mine to opencode.jsonc so I can add comments. OpenCode reads both formats. The examples in this guide use .jsonc for the same reason. I keep the whole config folder in a dotfiles repo and symlink it.
oh-my-opencode is a plugin that turns OpenCode into a multi-agent system. The recommended way to install it is to let an agent do it for you. Paste this prompt into an OpenCode session:
The agent will run the CLI installer and ask which model providers you have (Claude, OpenAI, Gemini, GitHub Copilot). Since this guide uses the Vercel AI Gateway as the provider for all models, you can say "no" to all of these. The installer still registers the plugin and sets up the agent structure. You'll configure the actual model routing through the AI Gateway in Step 3.
If you'd rather do it yourself, run the interactive installer:
This walks you through provider selection and writes the config files. Since we're using the Vercel AI Gateway, decline all provider subscriptions:
Either way, verify it worked by opening OpenCode. You should see additional agents available. Press Tab to cycle through them.
oh-my-opencode does a lot out of the box. The feature we care about here is assigning different models to different agents and task categories.
Reference: oh-my-opencode installation guide
This is where multi-model routing happens. The Vercel AI Gateway gives you a single endpoint for models from Anthropic, OpenAI, Google, and 40+ other providers. You use one API key and specify models in the format provider/model-name.
Here's what my Vercel provider block looks like in ~/.config/opencode/opencode.jsonc:
Each entry in the models block registers a model that OpenCode (and oh-my-opencode) can use. The key is the model identifier in provider/model-name format. Adding a new model is just adding a line.
Pricing: the AI Gateway charges tokens at the upstream provider's list price with zero markup. Every Vercel account gets $5 in free credits that reset every 30 days, with no restrictions on which models you can use. That's enough to run through this guide and test the full multi-model setup before committing. You can also bring your own API keys if you already have provider accounts.
Reference: Vercel AI Gateway documentation
agent-browser is a browser automation CLI built for AI agents. It uses semantic element references instead of raw DOM trees, which cuts context token usage by roughly 93%.
Install it as an OpenCode skill:
Select "Copy" mode when prompted. This copies the skill files into your OpenCode config rather than symlinking them, which is more reliable if you manage your config through dotfiles.
You'll rarely call agent-browser directly. The agents invoke it automatically when they need to verify something in a browser, take a screenshot, or test a web flow.
When a single model handles everything, you're paying the same rate for a codebase search as you are for architecture planning. That means either overspending on simple tasks or underperforming on complex ones.
Multi-model routing lets you assign models based on what each task actually needs:
| Tier | Models | Best for |
|---|---|---|
| Expensive (high reasoning) | Opus 4.6, GPT-5.2 Codex | Planning, architecture, complex debugging |
| Balanced (workhorse) | Sonnet 4.6 | General coding, research, documentation search |
| Cheap (fast) | Haiku 4.5, Gemini 3 Flash | Quick tasks, screenshot analysis, boilerplate |
Opus is roughly 10-20x more expensive per token than Haiku. If an explore agent runs 50 times during a refactoring session, routing it through Haiku instead of Opus saves real money. And the exploration quality is the same because that task doesn't need deep reasoning.
oh-my-opencode lets you assign models at two levels: per agent and per task category. I configure both in ~/.config/opencode/oh-my-opencode.jsonc.
Each specialized agent gets the model that fits its job:
The logic: Sisyphus orchestrates everything, so it gets the most capable model. Oracle handles debugging and architecture, which also needs deep reasoning. Hephaestus is the autonomous deep worker. You give it a goal and it figures out the steps itself, so it gets a strong reasoning model too. Explore and Librarian just search codebases and documentation, so they run on the balanced tier. The multimodal looker analyzes screenshots and PDFs, so it uses Gemini Flash, which handles vision tasks well at low cost.
oh-my-opencode also routes delegated tasks by category. When Sisyphus delegates a subtask, it picks a category, and that category maps to a model:
This means a single refactoring session might touch four or five different models without you doing anything. Sisyphus decides what each subtask needs, picks the category, and the right model handles it.
These are my assignments as of February 2026. The config schema may change, so check the oh-my-opencode docs if something looks off.
Here's how this looks in practice. I needed to refactor an authentication module from session-based auth to JWT tokens. I opened OpenCode and typed:
ulw is short for "ultrawork." Adding it to your prompt tells oh-my-opencode to use full orchestration: parallel background agents, automatic task delegation, and continuous execution until the job is done. Here's what happened next.
Sisyphus, the main orchestrator, received the prompt. This is a complex architectural change, so it needed context before making decisions. It fired two background agents in parallel:
- Explore (Sonnet 4.6): "Find all files related to authentication. Map the current session flow."
- Librarian (Sonnet 4.6): "Search for JWT implementation patterns in the project's framework."
Both ran simultaneously. While they worked, Sisyphus analyzed the high-level structure of the request. This is why the orchestrator needs a top-tier model: it's making strategic decisions about how to break down the work.
Within seconds, the background agents returned their findings. Explore mapped 12 files involved in the auth flow. Librarian found the framework's recommended JWT pattern in the official docs.
These agents don't need Opus-level reasoning. They're doing search and retrieval, which Sonnet handles well at a fraction of the cost.
With full context, Sisyphus (still running on Opus 4.6) created an implementation plan and started handing subtasks to other models. The JWT token generation logic went to a GPT-5.2 Codex worker because the cryptographic code needed careful reasoning. The middleware updates went to a Haiku 4.5 worker since they were straightforward pattern replacements. Sisyphus picked the model for each subtask based on its category. No manual switching involved.
After implementation, Sisyphus invoked agent-browser to verify the login flow still worked:
The browser test confirmed that login, token generation, and protected routes all worked correctly.
The session used five different models. Opus and GPT-5.2 Codex only ran for the tasks that needed them. Everything else ran on cheaper models. Same result, roughly 70% lower cost.
I don't track exact per-token costs because model pricing changes frequently. But the math is straightforward.
A session like the refactoring walkthrough above might cost around $3 in tokens if run entirely on Opus. With multi-model routing, the same session costs closer to $0.80. The explore agents, librarian lookups, and simple file edits all run on cheaper models. Only the planning and complex logic hit the expensive ones.
The Vercel AI Gateway passes through provider pricing with zero markup. There's a free tier with $5/month in credits that resets every 30 days after your first request. Enough to try the setup and run a few sessions before you decide to commit.
The exact numbers will vary. The point is you stop paying premium prices for tasks that don't need premium models.
You now have a setup where different agents use different models based on what each task needs. From here:
- Customize the model assignments in
~/.config/opencode/oh-my-opencode.jsoncto match your preferences - Try the
ulwkeyword on a real task in your own project and watch the agents coordinate - Experiment with which models work best for which categories in your workflow
- Add more models to your
~/.config/opencode/opencode.jsoncprovider block as new ones become available
Official documentation: