VercelVercel
Menu

Anthropic Messages API

Last updated March 7, 2026

AI Gateway provides Anthropic Messages API endpoints, so you can use the Anthropic SDK and tools like Claude Code through a unified gateway with only a URL change.

The Anthropic Messages API implements the same specification as the Anthropic Messages API.

For more on using AI Gateway with Claude Code, see the Claude Code instructions.

The Anthropic Messages API is available at the following base URL:

https://ai-gateway.vercel.sh

The Anthropic Messages API supports the same authentication methods as the main AI Gateway:

  • API key: Use your AI Gateway API key with the x-api-key header or Authorization: Bearer <token> header
  • OIDC token: Use your Vercel OIDC token with the Authorization: Bearer <token> header

You only need to use one of these forms of authentication. If an API key is specified it will take precedence over any OIDC token, even if the API key is invalid.

The AI Gateway supports the following Anthropic Messages API endpoint:

For advanced features, see:

Claude Code is Anthropic's agentic coding tool. You can configure it to use Vercel AI Gateway, enabling you to:

  • Route requests through multiple AI providers
  • Monitor traffic and spend in your AI Gateway Overview
  • View detailed traces in Vercel Observability under AI
  • Use any model available through the gateway
  1. Configure Claude Code to use the AI Gateway by setting these environment variables:

    VariableValue
    ANTHROPIC_BASE_URLhttps://ai-gateway.vercel.sh
    ANTHROPIC_AUTH_TOKENYour AI Gateway API key
    ANTHROPIC_API_KEY"" (empty string)

    Setting ANTHROPIC_API_KEY to an empty string is important. Claude Code checks this variable first, and if it's set to a non-empty value, it will use that instead of ANTHROPIC_AUTH_TOKEN.

    Add this alias to your ~/.zshrc (or ~/.bashrc):

    alias claude-vercel='ANTHROPIC_BASE_URL="https://ai-gateway.vercel.sh" ANTHROPIC_AUTH_TOKEN="your-api-key-here" ANTHROPIC_API_KEY="" claude'

    Then reload your shell:

    source ~/.zshrc

    For more flexibility (e.g., adding additional logic), create a wrapper script at ~/bin/claude-vercel:

    claude-vercel
    #!/usr/bin/env bash
    # Routes Claude Code through Vercel AI Gateway
     
    ANTHROPIC_BASE_URL="https://ai-gateway.vercel.sh" \
    ANTHROPIC_AUTH_TOKEN="your-api-key-here" \
    ANTHROPIC_API_KEY="" \
    claude "$@"

    Make it executable and ensure ~/bin is in your PATH:

    mkdir -p ~/bin
    chmod +x ~/bin/claude-vercel
    echo 'export PATH="$HOME/bin:$PATH"' >> ~/.zshrc
    source ~/.zshrc
  2. Run claude-vercel to start Claude Code with AI Gateway:

    claude-vercel

    Your requests will now be routed through Vercel AI Gateway.

You can use the AI Gateway's Anthropic Messages API with the official Anthropic SDK. Point your client to the AI Gateway's base URL and use your AI Gateway API key or OIDC token for authentication.

The examples and content in this section are not comprehensive. For complete documentation on available parameters, response formats, and advanced features, refer to the Anthropic Messages API documentation.

client.ts
import Anthropic from '@anthropic-ai/sdk';
 
const anthropic = new Anthropic({
  apiKey: process.env.AI_GATEWAY_API_KEY,
  baseURL: 'https://ai-gateway.vercel.sh',
});
 
const message = await anthropic.messages.create({
  model: 'anthropic/claude-sonnet-4.6',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Hello, world!' }],
});
client.py
import os
import anthropic
 
client = anthropic.Anthropic(
    api_key=os.getenv('AI_GATEWAY_API_KEY'),
    base_url='https://ai-gateway.vercel.sh'
)
 
message = client.messages.create(
    model='anthropic/claude-sonnet-4.6',
    max_tokens=1024,
    messages=[
        {'role': 'user', 'content': 'Hello, world!'}
    ]
)

The messages endpoint supports the following parameters:

  • model (string): The model to use (e.g., anthropic/claude-sonnet-4.6)
  • max_tokens (integer): Maximum number of tokens to generate
  • messages (array): Array of message objects with role and content fields
  • stream (boolean): Whether to stream the response. Defaults to false
  • temperature (number): Controls randomness in the output. Range: 0-1
  • top_p (number): Nucleus sampling parameter. Range: 0-1
  • top_k (integer): Top-k sampling parameter
  • stop_sequences (array): Stop sequences for the generation
  • tools (array): Array of tool definitions for function calling
  • tool_choice (object): Controls which tools are called
  • thinking (object): Extended thinking configuration
  • system (string or array): System prompt

The gateway passes through the cache_control parameter to Anthropic's prompt caching feature. This is explicit caching: you specify cache breakpoints, and Anthropic handles storing and reusing cached content automatically.

The cache_control parameter is passed through to Anthropic and Vertex AI Anthropic models for explicit caching. Other providers or models with implicit caching work automatically without any configuration.

Example request
caching.ts
import Anthropic from '@anthropic-ai/sdk';
 
const apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;
 
const anthropic = new Anthropic({
  apiKey,
  baseURL: 'https://ai-gateway.vercel.sh',
});
 
const message = await anthropic.messages.create({
  model: 'anthropic/claude-sonnet-4.5',
  max_tokens: 1024,
  system: [
    {
      type: 'text',
      text: 'You are a helpful assistant that analyzes documents.',
    },
    {
      type: 'text',
      text: longDocumentContent, // Large content to cache
      cache_control: { type: 'ephemeral' },
    },
  ],
  messages: [
    {
      role: 'user',
      content: 'Summarize the key points from this document.',
    },
  ],
});
 
console.log(message.usage);
// {
//   input_tokens: 50,
//   output_tokens: 200,
//   cache_creation_input_tokens: 10000,  // Tokens written to cache
//   cache_read_input_tokens: 0           // Tokens read from cache
// }
caching.py
import os
import anthropic
 
api_key = os.getenv('AI_GATEWAY_API_KEY') or os.getenv('VERCEL_OIDC_TOKEN')
 
client = anthropic.Anthropic(
    api_key=api_key,
    base_url='https://ai-gateway.vercel.sh'
)
 
message = client.messages.create(
    model='anthropic/claude-sonnet-4.5',
    max_tokens=1024,
    system=[
        {
            'type': 'text',
            'text': 'You are a helpful assistant that analyzes documents.',
        },
        {
            'type': 'text',
            'text': long_document_content,  # Large content to cache
            'cache_control': {'type': 'ephemeral'},
        },
    ],
    messages=[
        {
            'role': 'user',
            'content': 'Summarize the key points from this document.'
        }
    ],
)
 
print(message.usage)
# {
#   'input_tokens': 50,
#   'output_tokens': 200,
#   'cache_creation_input_tokens': 10000,  # Tokens written to cache
#   'cache_read_input_tokens': 0           # Tokens read from cache
# }

Add cache_control: { type: 'ephemeral' } to mark content that should be cached. You can place cache breakpoints on system messages, user message content, tool definitions, tool results, and assistant message content. Anthropic also supports automatic caching, where a single top-level cache_control field automatically applies to the last cacheable block.

For the full list of cacheable locations and automatic caching details, see the Anthropic prompt caching docs.

  • First request: Content up to the breakpoint is cached (cache_creation_input_tokens)
  • Subsequent requests: Matching prefixes are read from cache (cache_read_input_tokens)
  • TTL: Cached content expires after 5 minutes, refreshed on each cache hit

The API returns standard HTTP status codes and error responses:

  • 400 Bad Request: Invalid request parameters
  • 401 Unauthorized: Invalid or missing authentication
  • 403 Forbidden: Insufficient permissions
  • 404 Not Found: Model or endpoint not found
  • 429 Too Many Requests: Rate limit exceeded
  • 500 Internal Server Error: Server error
{
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "message": "Invalid request: missing required parameter 'max_tokens'"
  }
}

Was this helpful?

supported.