o3-mini
o3-mini is a cost-efficient reasoning model in the o3 family, delivering strong chain-of-thought performance on math, code, and science at a fraction of full o3's cost, with configurable reasoning effort for flexible cost-quality tradeoffs.
import { streamText } from 'ai'
const result = streamText({ model: 'openai/o3-mini', prompt: 'Why is the sky blue?'})Playground
Try out o3-mini by OpenAI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
Ask o3-mini anything to try it out.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by OpenAI
| Model |
|---|
About o3-mini
o3-mini was released on January 31, 2025 as the cost-efficient tier of the o3 reasoning model family. It continues the pattern established by o1-mini: delivering strong chain-of-thought reasoning on structured domains (mathematics, coding, science) at a fraction of the full model's cost.
The model supports the reasoning_effort parameter, letting you control reasoning depth per request. Low effort for straightforward technical queries conserves tokens and reduces cost; high effort for competition-level problems applies the full reasoning capability. This flexibility lets you use o3-mini as the default for all technical queries rather than maintaining a routing layer.
With a context window of 200K tokens and support for the standard API features, o3-mini handles the same types of requests as full o3. The tradeoff is concentrated in reasoning depth: on the hardest problems, full o3 will produce more thorough analysis.
What To Consider When Choosing a Provider
- Configuration: o3-mini makes chain-of-thought reasoning affordable enough to run on every request rather than reserving it for the hardest problems. The
reasoning_effortparameter enables further cost optimization. - Configuration: Like o1-mini before it, o3-mini concentrates its reasoning capability on structured problem domains rather than broad general knowledge.
- Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use o3-mini
Best For
- Math and science reasoning: Competition-level problems, derivations, and quantitative analysis at accessible cost
- Code reasoning: Algorithm analysis, debugging, and optimization with step-by-step deliberation
- High-frequency reasoning pipelines: Per-request chain-of-thought on technical workloads at scale
- Education platforms: Tutoring and problem-solving assistance with visible reasoning steps
- Cost-optimized reasoning: Tasks that benefit from deliberation but don't justify full o3 pricing
Consider Alternatives When
- Maximum reasoning quality: Full o3 for the hardest problems where every increment of accuracy matters
- Broader knowledge needed: Full o3 or GPT-5 for tasks requiring wide-ranging factual recall
- Fastest reasoning: O4-mini for a newer cost-efficient reasoning option with vision support
- General-purpose tasks: GPT-5 mini for workloads that don't benefit from chain-of-thought
Conclusion
o3-mini makes chain-of-thought reasoning broadly accessible by bringing o3-family performance to a cost tier that scales. For technical workloads on AI Gateway where per-request reasoning is desirable but full o3 pricing is not, it provides the right balance.