o3-mini
o3-mini is a cost-efficient reasoning model in the o3 family, delivering strong chain-of-thought performance on math, code, and science at a fraction of full o3's cost, with configurable reasoning effort for flexible cost-quality tradeoffs.
import { streamText } from 'ai'
const result = streamText({ model: 'openai/o3-mini', prompt: 'Why is the sky blue?'})Playground
Try out o3-mini by OpenAI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by OpenAI
| Model |
|---|
About o3-mini
o3-mini was released on December 20, 2024 as the cost-efficient tier of the o3 reasoning model family. It continues the pattern established by o1-mini: delivering strong chain-of-thought reasoning on structured domains (mathematics, coding, science) at a fraction of the full model's cost.
The model supports the reasoning_effort parameter, letting you control reasoning depth per request. Low effort for straightforward technical queries conserves tokens and reduces cost; high effort for competition-level problems applies the full reasoning capability. This flexibility lets you use o3-mini as the default for all technical queries rather than maintaining a routing layer.
With a context window of 200K tokens and support for the standard API features, o3-mini handles the same types of requests as full o3. The tradeoff is concentrated in reasoning depth: on the hardest problems, full o3 will produce more thorough analysis.
What To Consider When Choosing a Provider
- Configuration: o3-mini makes chain-of-thought reasoning affordable enough to run on every request rather than reserving it for the hardest problems. The
reasoning_effortparameter enables further cost optimization. - Configuration: Like o1-mini before it, o3-mini concentrates its reasoning capability on structured problem domains rather than broad general knowledge.
- Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use o3-mini
Best For
- Math and science reasoning: Competition-level problems, derivations, and quantitative analysis at accessible cost
- Code reasoning: Algorithm analysis, debugging, and optimization with step-by-step deliberation
- High-frequency reasoning pipelines: Per-request chain-of-thought on technical workloads at scale
- Education platforms: Tutoring and problem-solving assistance with visible reasoning steps
- Cost-optimized reasoning: Tasks that benefit from deliberation but don't justify full o3 pricing
Consider Alternatives When
- Maximum reasoning quality: Full o3 for the hardest problems where every increment of accuracy matters
- Broader knowledge needed: Full o3 or GPT-5 for tasks requiring wide-ranging factual recall
- Fastest reasoning: O4-mini for a newer cost-efficient reasoning option with vision support
- General-purpose tasks: GPT-5 mini for workloads that don't benefit from chain-of-thought
Conclusion
o3-mini makes chain-of-thought reasoning broadly accessible by bringing o3-family performance to a cost tier that scales. For technical workloads on AI Gateway where per-request reasoning is desirable but full o3 pricing is not, it provides the right balance.
Frequently Asked Questions
How does o3-mini compare to o1-mini?
o3-mini is the next generation of cost-efficient reasoning, delivering stronger performance on key benchmarks while maintaining the affordability that makes per-request reasoning practical.
Does o3-mini support the
reasoning_effortparameter?Yes. You can control reasoning depth per request, enabling cost optimization across mixed-difficulty workloads.
What context window does o3-mini support?
200K tokens, matching the o3 family.
When should I use full o3 instead?
When the hardest problems require maximum reasoning depth and the quality gap between mini and full is consequential for your application.
How does AI Gateway handle authentication for o3-mini?
AI Gateway accepts a single API key or OIDC token for all requests. You don't embed OpenAI credentials in your application; AI Gateway routes and authenticates on your behalf.
What are typical latency characteristics?
This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.