Skip to content

GPT-4o mini

GPT-4o mini is OpenAI's cost-efficient multimodal model, priced at $0.15 per million input tokens, at reduced cost compared to GPT-3.5 Turbo, while outperforming GPT-4 on chat preference benchmarks and supporting vision and function calling.

File InputTool UseVision (Image)Implicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'openai/gpt-4o-mini',
prompt: 'Why is the sky blue?'
})

Playground

Try out GPT-4o mini by OpenAI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Azure
Legal:Terms
Privacy
128K
0.7s
91tps
$0.15/M$0.60/M
Read:$0.07/M
Write:
$14/K
+ input costs
07/18/2024
OpenAI
Legal:Terms
Privacy
128K
0.8s
52tps
$0.15/M$0.60/M
Read:$0.07/M
Write:
$10.00/K
+ input costs
07/18/2024
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by OpenAI

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
0.7s
89tps
$5.00/M
$30.00/M
Read:
$0.5/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
04/24/2026
400K
1.5s
252tps
$0.75/M$4.50/M
Read:$0.07/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
03/17/2026
400K
0.6s
148tps
$0.20/M$1.25/M
Read:$0.02/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
03/17/2026
1.1M
0.6s
65tps
$2.50/M
$15.00/M
Read:
$0.25/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
03/05/2026
128K
0.8s
108tps
$1.25/M$10.00/M
Read:$0.13/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
11/12/2025
131K
0.1s
418tps
$0.35/M$0.75/M
Read:$0.25/M
Write:
baseten logo
bedrock logo
cerebras logo
+5
08/05/2025

About GPT-4o mini

GPT-4o mini launched on July 18, 2024 as OpenAI's cost-efficient model, positioned to replace GPT-3.5 Turbo for cost-sensitive deployments while providing meaningfully higher capability. The pricing stands out: $0.15 per million input tokens and $0.6 per million output tokens, at reduced cost compared to GPT-3.5 Turbo. It scored 82.0% on MMLU (Massive Multitask Language Understanding), exceeding GPT-3.5 Turbo, and topped GPT-4 on the LMSYS Chatbot Arena chat preference leaderboard at release.

GPT-4o mini supports vision alongside text, inheriting GPT-4o's multimodal design at the small-model tier. You can run cost-efficient image analysis, document processing, visual classification, and screenshot interpretation without routing to a larger model. Function calling support makes it viable as the reasoning layer in tool-using agents and API-calling pipelines.

OpenAI highlighted four patterns where GPT-4o mini excels: chaining or parallelizing multiple model calls, passing large volumes of context such as full codebases or conversation histories, fast real-time text responses for customer-facing interfaces, and workloads previously blocked by GPT-3.5 Turbo's capability ceiling. The context window of 128K tokens gives it substantial headroom for each of these.

What To Consider When Choosing a Provider

  • Configuration: For applications that chain multiple model calls (classify, then extract, then format), GPT-4o mini's per-call cost makes it practical to run several sequential inferences per user request without the economics becoming prohibitive.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use GPT-4o mini

Best For

  • Customer support chatbots: Live interaction features requiring fast, affordable multi-turn responses
  • Multi-call pipelines: Sequential or parallel model calls per user action where per-call cost accumulates quickly
  • Budget vision workflows: Image description, document OCR assistance, and visual classification at the small-model tier
  • Function-calling agents: Reliable tool invocation at low cost per call
  • Large conversation histories: Processing codebases and extended chats within the context window of 128K tokens at minimal cost

Consider Alternatives When

  • Higher quality ceiling: GPT-4o or GPT-4.1 handle complex reasoning, nuanced writing, or difficult coding tasks better
  • Advanced multimodal processing: More capable vision or audio workloads require a larger model
  • Deep chain-of-thought: O1-mini is purpose-built for extended reasoning

Conclusion

GPT-4o mini arrived as the model that made it economically viable to embed language model capability into every layer of an application, not just the final user-facing response, but classification, routing, extraction, and tool-use steps throughout a pipeline. Its combination of low price, multimodal input, function calling, and a context window of 128K tokens covers the majority of high-volume production use cases through AI Gateway.

Frequently Asked Questions

  • How does GPT-4o mini compare to GPT-3.5 Turbo on price?

    Pricing appears on this page and updates as providers adjust their rates. AI Gateway routes traffic through the configured provider.

  • Does GPT-4o mini support image input?

    Yes. It supports vision alongside text, enabling image analysis, document processing, and visual classification at the small-model cost tier.

  • What benchmark scores did GPT-4o mini achieve?

    82.0% on MMLU, outperforming comparable small models and topping GPT-4 on the LMSYS Chatbot Arena chat preference leaderboard at launch.

  • Is GPT-4o mini suitable for function calling and tool use?

    Yes. Function calling is supported, and OpenAI highlighted agentic pipelines that call external APIs as one of the key intended use cases.

  • What is the context window for GPT-4o mini?

    128K tokens, providing ample space for conversation histories, long codebases, and extended document processing.

  • How does gpt-4o-mini (the alias) differ from gpt-4o-mini-2024-07-18?

    The alias gpt-4o-mini points to the current recommended version and may be updated. The dated snapshot gpt-4o-mini-2024-07-18 is pinned to the specific July 18, 2024 release.

  • What are typical latency characteristics?

    This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.