Skip to content

o3-mini

o3-mini is a cost-efficient reasoning model in the o3 family, delivering strong chain-of-thought performance on math, code, and science at a fraction of full o3's cost, with configurable reasoning effort for flexible cost-quality tradeoffs.

File InputReasoningTool UseImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'openai/o3-mini',
prompt: 'Why is the sky blue?'
})

Frequently Asked Questions

  • How does o3-mini compare to o1-mini?

    o3-mini is the next generation of cost-efficient reasoning, delivering stronger performance on key benchmarks while maintaining the affordability that makes per-request reasoning practical.

  • Does o3-mini support the reasoning_effort parameter?

    Yes. You can control reasoning depth per request, enabling cost optimization across mixed-difficulty workloads.

  • What context window does o3-mini support?

    200K tokens, matching the o3 family.

  • When should I use full o3 instead?

    When the hardest problems require maximum reasoning depth and the quality gap between mini and full is consequential for your application.

  • How does AI Gateway handle authentication for o3-mini?

    AI Gateway accepts a single API key or OIDC token for all requests. You don't embed OpenAI credentials in your application; AI Gateway routes and authenticates on your behalf.

  • What are typical latency characteristics?

    This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.