Skip to content
Vercel April 2026 security incident

GPT-4o mini

openai/gpt-4o-mini

GPT-4o mini is OpenAI's cost-efficient multimodal model, priced at $0.15 per million input tokens, at reduced cost compared to GPT-3.5 Turbo, while outperforming GPT-4 on chat preference benchmarks and supporting vision and function calling.

File InputTool UseVision (Image)Implicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'openai/gpt-4o-mini',
prompt: 'Why is the sky blue?'
})

What To Consider When Choosing a Provider

  • Zero Data Retention

    AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.

    Authentication

    AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

For applications that chain multiple model calls (classify, then extract, then format), GPT-4o mini's per-call cost makes it practical to run several sequential inferences per user request without the economics becoming prohibitive.

When to Use GPT-4o mini

Best For

  • Customer support chatbots:

    Live interaction features requiring fast, affordable multi-turn responses

  • Multi-call pipelines:

    Sequential or parallel model calls per user action where per-call cost accumulates quickly

  • Budget vision workflows:

    Image description, document OCR assistance, and visual classification at the small-model tier

  • Function-calling agents:

    Reliable tool invocation at low cost per call

  • Large conversation histories:

    Processing codebases and extended chats within the context window of 128K tokens at minimal cost

Consider Alternatives When

  • Higher quality ceiling:

    GPT-4o or GPT-4.1 handle complex reasoning, nuanced writing, or difficult coding tasks better

  • Advanced multimodal processing:

    More capable vision or audio workloads require a larger model

  • Deep chain-of-thought:

    O1-mini is purpose-built for extended reasoning

Conclusion

GPT-4o mini arrived as the model that made it economically viable to embed language model capability into every layer of an application, not just the final user-facing response, but classification, routing, extraction, and tool-use steps throughout a pipeline. Its combination of low price, multimodal input, function calling, and a context window of 128K tokens covers the majority of high-volume production use cases through AI Gateway.

FAQ

Pricing appears on this page and updates as providers adjust their rates. AI Gateway routes traffic through the configured provider.

Yes. It supports vision alongside text, enabling image analysis, document processing, and visual classification at the small-model cost tier.

82.0% on MMLU, outperforming comparable small models and topping GPT-4 on the LMSYS Chatbot Arena chat preference leaderboard at launch.

Yes. Function calling is supported, and OpenAI highlighted agentic pipelines that call external APIs as one of the key intended use cases.

128K tokens, providing ample space for conversation histories, long codebases, and extended document processing.

The alias gpt-4o-mini points to the current recommended version and may be updated. The dated snapshot gpt-4o-mini-2024-07-18 is pinned to the specific July 18, 2024 release.

This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.