Skip to content

o4-mini

o4-mini advances OpenAI's compact reasoning model line with stronger performance and greater efficiency than o3-mini, adding native tool use and image reasoning.

File InputReasoningTool UseVision (Image)Implicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'openai/o4-mini',
prompt: 'Why is the sky blue?'
})

Playground

Try out o4-mini by OpenAI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

About o4-mini

o4-mini was released on April 16, 2025 alongside o3 as a cost-efficient reasoning model from OpenAI. It advances the compact reasoning model line (following o1-mini and o3-mini) with improvements across reasoning quality, efficiency, and multimodal capability.

A key advancement is native vision support: o4-mini can reason over images, diagrams, mathematical notation, and screenshots, combining visual understanding with chain-of-thought analysis. Earlier mini reasoning models were text-only. This opens up visual reasoning tasks at the affordable mini-tier pricing.

The model supports function calling and tool use, making it suitable as the reasoning layer in lightweight agent architectures. Combined with the reasoning_effort parameter, it lets you build cost-optimized pipelines that apply just enough reasoning to each request.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Azure
Legal:Terms
Privacy
200K
2.3s
177tps
$1.10/M$4.40/M
Read:$0.28/M
Write:
$14/K
+ input costs
04/16/2025
OpenAI
Legal:Terms
Privacy
200K
4.0s
125tps
$1.10/M$4.40/M
Read:$0.28/M
Write:
$10/K
+ input costs
04/16/2025
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by OpenAI

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
3.4s
100tps
$5.00/M
$30.00/M
Read:
$0.5/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
04/24/2026
400K
2.9s
236tps
$0.75/M$4.50/M
Read:$0.07/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
03/17/2026
400K
0.7s
80tps
$0.20/M$1.25/M
Read:$0.02/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
03/17/2026
1.1M
0.7s
55tps
$2.50/M
$15.00/M
Read:
$0.25/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
03/05/2026
128K
1.1s
80tps
$1.25/M$10.00/M
Read:$0.13/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
11/12/2025
131K
0.2s
2042tps
$0.35/M$0.75/M
Read:$0.25/M
Write:
baseten logo
bedrock logo
cerebras logo
+5
08/05/2025

What To Consider When Choosing a Provider

  • Configuration: o4-mini incorporates advances beyond o3-mini, including native vision support. It's a strong option for projects that need affordable chain-of-thought reasoning.
  • Configuration: Unlike earlier mini reasoning models, o4-mini natively supports vision input, enabling reasoning over images, diagrams, and documents.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use o4-mini

Best For

  • Affordable chain-of-thought reasoning: Per-request deliberation on technical tasks at scale
  • Visual reasoning: Analyzing diagrams, charts, mathematical notation, and screenshots with step-by-step thinking
  • Tool-using agents: Lightweight reasoning backbone for agents that call external tools and APIs
  • Math and code reasoning: Competition-level problems and algorithmic analysis at accessible cost
  • Mixed-difficulty pipelines: Using reasoning_effort to optimize cost across varied query complexity

Consider Alternatives When

  • Maximum reasoning depth: O3 or o3-pro for the hardest problems requiring exhaustive deliberation
  • General-purpose tasks: GPT-5 mini for workloads that don't benefit from chain-of-thought
  • Coding agent workflows: Codex models for autonomous software engineering
  • Non-reasoning speed: GPT-5.1 instant for the fastest possible general-purpose responses

Conclusion

o4-mini combines stronger reasoning performance than o3-mini with native vision and tool use at an affordable price point. For technical workloads on AI Gateway that need per-request reasoning with multimodal support, it advances the cost-efficient reasoning tier.

Frequently Asked Questions

  • How does o4-mini improve over o3-mini?

    It delivers stronger reasoning performance with greater efficiency, adds native vision support, and includes improved tool use capabilities.

  • Does o4-mini support image input?

    Yes. Unlike earlier mini reasoning models, it natively processes images, diagrams, and visual content as part of its chain-of-thought reasoning.

  • What is the reasoning_effort parameter?

    It controls how deeply the model reasons per request. Low effort for simple queries saves cost; high effort for hard problems enables thorough deliberation.

  • What context window does o4-mini support?

    200K tokens, providing ample capacity for complex reasoning tasks.

  • How does AI Gateway handle authentication for o4-mini?

    AI Gateway accepts a single API key or OIDC token for all requests. You don't embed OpenAI credentials in your application; AI Gateway routes and authenticates on your behalf.

  • When should I use o3 instead of o4-mini?

    When the hardest problems require maximum reasoning depth and the quality gap between o4-mini and o3 is consequential for your application.

  • What are typical latency characteristics?

    This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.