How does o4-mini improve over o3-mini?

It delivers stronger reasoning performance with greater efficiency, adds native vision support, and includes improved tool use capabilities.

Does o4-mini support image input?

Yes. Unlike earlier mini reasoning models, it natively processes images, diagrams, and visual content as part of its chain-of-thought reasoning.

What is the `reasoning_effort` parameter?

It controls how deeply the model reasons per request. Low effort for simple queries saves cost; high effort for hard problems enables thorough deliberation.

What context window does o4-mini support?

200K tokens, providing ample capacity for complex reasoning tasks.

How does AI Gateway handle authentication for o4-mini?

AI Gateway accepts a single API key or OIDC token for all requests. You don't embed OpenAI credentials in your application; AI Gateway routes and authenticates on your behalf.

When should I use o3 instead of o4-mini?

When the hardest problems require maximum reasoning depth and the quality gap between o4-mini and o3 is consequential for your application.

What are typical latency characteristics?

This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.

Dashboard

o4-mini

o4-mini advances OpenAI's compact reasoning model line with stronger performance and greater efficiency than o3-mini, adding native tool use and image reasoning.

File InputReasoningTool UseVision (Image)Implicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'openai/o4-mini',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out o4-mini by OpenAI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

200K

2.1s

$1.10/M

$4.40/M

Read:$0.28/M

Write:—

$14/K

+ input costs

—

04/16/2025

Legal:Terms

•

Privacy

200K

1.8s

269tps

$1.10/M

$4.40/M

Read:$0.28/M

Write:—

$10/K

+ input costs

—

04/16/2025

More models by OpenAI

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

2.8s

68tps

$5.00/M

$30.00/M

Read:

$0.5/M

Write:

—

$10.00/K

+ input costs

—

04/24/2026

400K

1.5s

273tps

$0.75/M

$4.50/M

Read:$0.07/M

Write:—

$10.00/K

+ input costs

—

03/17/2026

400K

0.6s

89tps

$0.20/M

$1.25/M

Read:$0.02/M

Write:—

$10.00/K

+ input costs

—

03/17/2026

1.1M

1.9s

78tps

$2.50/M

$15.00/M

Read:

$0.25/M

Write:

—

$10.00/K

+ input costs

—

03/05/2026

131K

0.2s

470tps

$0.35/M

$0.75/M

Read:$0.25/M

Write:—

—

08/05/2025

128K

0.4s

61tps

$0.15/M

$0.60/M

Read:$0.07/M

Write:—

$14/K

+ input costs

—

07/18/2024

About o4-mini

o4-mini was released on April 16, 2025 alongside o3 as a cost-efficient reasoning model from OpenAI. It advances the compact reasoning model line (following o1-mini and o3-mini) with improvements across reasoning quality, efficiency, and multimodal capability.

A key advancement is native vision support: o4-mini can reason over images, diagrams, mathematical notation, and screenshots, combining visual understanding with chain-of-thought analysis. Earlier mini reasoning models were text-only. This opens up visual reasoning tasks at the affordable mini-tier pricing.

The model supports function calling and tool use, making it suitable as the reasoning layer in lightweight agent architectures. Combined with the reasoning_effort parameter, it lets you build cost-optimized pipelines that apply just enough reasoning to each request.

What To Consider When Choosing a Provider

Configuration: o4-mini incorporates advances beyond o3-mini, including native vision support. It's a strong option for projects that need affordable chain-of-thought reasoning.
Configuration: Unlike earlier mini reasoning models, o4-mini natively supports vision input, enabling reasoning over images, diagrams, and documents.
Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use o4-mini

Best For

Affordable chain-of-thought reasoning: Per-request deliberation on technical tasks at scale
Visual reasoning: Analyzing diagrams, charts, mathematical notation, and screenshots with step-by-step thinking
Tool-using agents: Lightweight reasoning backbone for agents that call external tools and APIs
Math and code reasoning: Competition-level problems and algorithmic analysis at accessible cost
Mixed-difficulty pipelines: Using reasoning_effort to optimize cost across varied query complexity

Consider Alternatives When

Maximum reasoning depth: O3 or o3-pro for the hardest problems requiring exhaustive deliberation
General-purpose tasks: GPT-5 mini for workloads that don't benefit from chain-of-thought
Coding agent workflows: Codex models for autonomous software engineering
Non-reasoning speed: GPT-5.1 instant for the fastest possible general-purpose responses

Conclusion

o4-mini combines stronger reasoning performance than o3-mini with native vision and tool use at an affordable price point. For technical workloads on AI Gateway that need per-request reasoning with multimodal support, it advances the cost-efficient reasoning tier.

Frequently Asked Questions

How does o4-mini improve over o3-mini?
It delivers stronger reasoning performance with greater efficiency, adds native vision support, and includes improved tool use capabilities.
Does o4-mini support image input?
Yes. Unlike earlier mini reasoning models, it natively processes images, diagrams, and visual content as part of its chain-of-thought reasoning.
What is the reasoning_effort parameter?
It controls how deeply the model reasons per request. Low effort for simple queries saves cost; high effort for hard problems enables thorough deliberation.
What context window does o4-mini support?
200K tokens, providing ample capacity for complex reasoning tasks.
How does AI Gateway handle authentication for o4-mini?
AI Gateway accepts a single API key or OIDC token for all requests. You don't embed OpenAI credentials in your application; AI Gateway routes and authenticates on your behalf.
When should I use o3 instead of o4-mini?
When the hardest problems require maximum reasoning depth and the quality gap between o4-mini and o3 is consequential for your application.
What are typical latency characteristics?
This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

o4-mini

Playground

Providers

More models by OpenAI

About o4-mini

What To Consider When Choosing a Provider

When to Use o4-mini

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions