Model Fallbacks and Usage Tracking

A single model going down shouldn't take your app with it. And if you don't track token usage, you'll find out about costs from your invoice instead of your dashboard.

Outcome

Create a centralized AI provider configuration with automatic model fallbacks and token usage logging.

Fast Track

Configure fallback models in the AI Gateway dashboard (the preferred approach)
Create src/lib/ai/provider.ts with a shared AI Gateway client
Add a wrapLanguageModel middleware that logs token usage

Gateway-Level Fallbacks (Preferred)

The easiest way to add model fallbacks is in the AI Gateway itself, not in your code. In the Vercel dashboard:

Go to your project Settings → AI Gateway
Select your primary model (anthropic/claude-sonnet-4)
Add a fallback model (anthropic/claude-haiku-4.5)
Set conditions: timeout threshold, error codes that trigger fallback

When the primary model is unavailable or slow, the gateway automatically routes to the fallback. Your application code doesn't change at all. No try/catch, no retry logic, no second model configuration. The gateway handles it at the infrastructure level.

This is the approach we recommend for most production apps. Code-level fallbacks (shown later in the Advanced section) are there for cases where you need fine-grained control, like falling back only for specific endpoints or adjusting the prompt for a different model.

Why Centralize the Provider?

Right now, each endpoint creates its own gateway client:

// In api/chat/+server.ts
const gateway = createGateway({ apiKey: AI_GATEWAY_API_KEY });
 
// In api/parse-alert/+server.ts (same thing, duplicated)
const gateway = createGateway({ apiKey: AI_GATEWAY_API_KEY });

A centralized provider means one place to:

Configure the API key
Add usage tracking middleware
Define fallback models
Adjust settings across all endpoints

Hands-on exercise 2.4

Let's create a centralized AI provider with usage tracking and model fallbacks:

Requirements:

Configure fallback models in the AI Gateway dashboard
Create src/lib/ai/provider.ts
Use wrapLanguageModel to add middleware that logs input/output token counts
Export a getModel() function that returns a wrapped model instance
Update the chat and parse-alert endpoints to use the shared provider

Implementation hints:

Set up gateway-level fallbacks first (dashboard config, no code needed)
wrapLanguageModel from the ai package wraps any model with middleware hooks
The middleware object needs specificationVersion: 'v3'
wrapGenerate intercepts generateText() calls, wrapStream intercepts streamText() calls. You need both to cover all endpoints
Token usage is available as result.usage.inputTokens.total and result.usage.outputTokens.total

Try It

Send a chat message and check server logs:

What's the weather like at Mammoth?

Server logs should show:

[AI Usage] Model: anthropic/claude-sonnet-4
[AI Usage] Input tokens: 245
[AI Usage] Output tokens: 89
[AI Usage] Total tokens: 334

Test the parse-alert endpoint (also uses the shared provider):

$ curl -X POST http://localhost:5173/api/parse-alert \
  -H "Content-Type: application/json" \
  -d '{"query": "powder at Grand Targhee"}'

Server logs should show usage for this request too.

Verify fallback behavior:

Temporarily change the primary model to an invalid name and verify the fallback model handles the request.

Commit

git add -A
git commit -m "feat(ai): centralize provider with usage tracking and fallbacks"
git push

Done-When

Fallback model is configured in the AI Gateway dashboard
src/lib/ai/provider.ts exports a getModel() function
Token usage is logged for every AI request
Chat and parse-alert endpoints use the shared provider instead of their own clients

Solution

src/lib/ai/provider.ts

import { createGateway, wrapLanguageModel } from 'ai';
import { AI_GATEWAY_API_KEY } from '$env/static/private';
 
const gateway = createGateway({
  apiKey: AI_GATEWAY_API_KEY
});
 
const PRIMARY_MODEL = 'anthropic/claude-sonnet-4';
const FALLBACK_MODEL = 'anthropic/claude-haiku-4.5';
 
function logUsage(usage: { inputTokens: { total?: number }; outputTokens: { total?: number } }) {
  const input = usage.inputTokens.total ?? 0;
  const output = usage.outputTokens.total ?? 0;
  console.log(`[AI Usage] Input tokens: ${input}`);
  console.log(`[AI Usage] Output tokens: ${output}`);
  console.log(`[AI Usage] Total tokens: ${input + output}`);
}
 
function withUsageTracking(model: ReturnType<typeof gateway>) {
  return wrapLanguageModel({
    model,
    middleware: {
      specificationVersion: 'v3',
      wrapGenerate: async ({ doGenerate }) => {
        const result = await doGenerate();
        if (result.usage) logUsage(result.usage);
        return result;
      },
      wrapStream: async ({ doStream }) => {
        const { stream, ...rest } = await doStream();
        let usage: typeof rest.rawResponse | undefined;
 
        return {
          stream: stream.pipeThrough(
            new TransformStream({
              transform(chunk, controller) {
                if (chunk.type === 'usage') usage = chunk.value;
                controller.enqueue(chunk);
              },
              flush() {
                if (usage) logUsage(usage);
              }
            })
          ),
          ...rest
        };
      }
    }
  });
}
 
export function getModel() {
  return withUsageTracking(gateway(PRIMARY_MODEL));
}
 
export function getFallbackModel() {
  return withUsageTracking(gateway(FALLBACK_MODEL));
}
 
export { PRIMARY_MODEL, FALLBACK_MODEL };

Updated chat endpoint using the shared provider:

src/routes/api/chat/+server.ts

import { getModel } from '$lib/ai/provider';
import { streamText, tool, stepCountIs } from 'ai';
import { valibotSchema } from '@ai-sdk/valibot';
import { resorts } from '$lib/data/resorts';
import { CreateAlertToolInputSchema } from '$lib/schemas/alert';
import type { RequestHandler } from './$types';
 
// Remove the local anthropic client. Use getModel() instead
 
export const POST: RequestHandler = async ({ request }) => {
  const { message } = await request.json();
 
  const resortList = resorts
    .map((r) => `- ${r.name} (id: ${r.id})`)
    .join('\n');
 
  const result = streamText({
    model: getModel(), // Uses the centralized, tracked model
    system: `You are a helpful ski conditions assistant...`,
    messages: [{ role: 'user', content: message }],
    tools: {
      create_alert: tool({ /* ... same as before */ })
    },
    stopWhen: stepCountIs(3)
  });
 
  // ... rest of the SSE stream logic unchanged
};

Updated parse-alert endpoint: The same change applies. Replace the local createGateway and gateway(...) call with getModel():

src/routes/api/parse-alert/+server.ts

import { generateText, Output } from 'ai';
import { valibotSchema } from '@ai-sdk/valibot';
import * as v from 'valibot';
import { resorts } from '$lib/data/resorts';
import { CreateAlertToolInputSchema, AlertConditionSchema } from '$lib/schemas/alert';
import type { RequestHandler } from './$types';
import { getModel } from '$lib/ai/provider';
 
// Remove the local gateway client. Use getModel() in the generateText call:
//   model: getModel(),

The gateway handles fallbacks at the infrastructure level (configured in the dashboard earlier). The getFallbackModel() export is available for cases where you need explicit code-level control, covered in the Advanced section below.

wrapLanguageModel intercepts the model's lifecycle. wrapGenerate handles generateText() calls (like the parse-alert endpoint), while wrapStream handles streamText() calls (like the chat endpoint). Both hooks need to be present to track usage across all endpoints. The middleware needs specificationVersion: 'v3' in the v6 SDK. Usage data lives on result.usage with inputTokens.total and outputTokens.total. Since all endpoints now use getModel(), tracking and config changes apply everywhere from one file.

Troubleshooting

Token counts show 0 for both input and output

Your middleware may not be wired up correctly. Verify that withUsageTracking is being called and that getModel() returns the wrapped model, not the raw gateway model.

result.usage is undefined

Check that specificationVersion: 'v3' is set in the middleware object. Without it, the v6 SDK won't pass usage data to your hooks.

Advanced: Code-Level Fallbacks

If you need fallback behavior that's more nuanced than the gateway dashboard allows, like adjusting the prompt for a different model or falling back only for specific endpoints, handle it in code. The getFallbackModel() export from the provider gives you a cheaper, faster model:

import { getModel, getFallbackModel } from '$lib/ai/provider';
 
// In your stream's start() function:
async start(controller) {
  try {
    for await (const part of result.fullStream) {
      // ... handle parts
    }
  } catch (error) {
    console.warn('[AI Fallback] Primary stream failed, retrying with fallback');
    const fallbackResult = streamText({ ...options, model: getFallbackModel() });
    for await (const part of fallbackResult.fullStream) {
      // ... handle parts
    }
  }
}

For most apps, the gateway-level approach is simpler and sufficient. Use code-level fallbacks when you need the extra control.

Advanced: Cost Estimation

Add per-request cost estimates to your logs:

// Approximate pricing (check anthropic.com/pricing for current rates)
const PRICING = {
  'anthropic/claude-sonnet-4': { input: 3.0, output: 15.0 }, // per million tokens
  'anthropic/claude-haiku-4.5': { input: 0.25, output: 1.25 }
};
 
function estimateCost(
  modelId: string,
  inputTokens: number,
  outputTokens: number
): string {
  const prices = PRICING[modelId as keyof typeof PRICING];
  if (!prices) return 'unknown';
  const cost =
    (inputTokens / 1_000_000) * prices.input +
    (outputTokens / 1_000_000) * prices.output;
  return `$${cost.toFixed(6)}`;
}

In production, you'd send these metrics to a monitoring service rather than just logging them.