o1
o1 is the production reasoning model that combines extended chain-of-thought computation with full tool support, structured outputs, vision, and a reasoning_effort parameter, delivering deeper problem-solving at 60% fewer reasoning tokens than o1-preview.
import { streamText } from 'ai'
const result = streamText({ model: 'openai/o1', prompt: 'Why is the sky blue?'})Playground
Try out o1 by OpenAI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
Ask o1 anything to try it out.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by OpenAI
| Model |
|---|
About o1
OpenAI released o1 on December 5, 2024 as o1-2024-12-17. This is the point where OpenAI's reasoning architecture became production-ready. The September 2024 preview proved the concept: chain-of-thought reasoning scoring 83% on International Mathematical Olympiad (IMO) qualifying problems. But it shipped without the API features production systems depend on. The production o1 fills those gaps.
Function calling means o1 can participate in agentic workflows: querying databases, hitting APIs, and invoking tools mid-reasoning. Structured Outputs via constrained JSON schema decoding let downstream systems consume responses without fragile parsing. Developer system messages restore the ability to set behavioral constraints and context. Vision input enables reasoning over images, circuit diagrams, mathematical notation in photographs, and charts that require interpretation.
The efficiency gains are equally significant. o1 uses 60% fewer reasoning tokens on average compared to o1-preview for equivalent quality. Fewer reasoning tokens means lower cost per request and shorter time-to-first-token. The context window of 200K tokens (expanded from the preview's 128K) accommodates the longer inputs that complex reasoning tasks demand.
The reasoning_effort parameter, unique to the production o1, controls how deeply the model thinks. Set it low for questions where a quick chain of thought suffices. Set it high for problems that genuinely require extended deliberation. In a pipeline mixing easy and hard queries, this single parameter can cut aggregate reasoning token spend substantially.
What To Consider When Choosing a Provider
- Configuration: The
reasoning_effortparameter lets you dial reasoning depth up or down per request. A single deployment can handle both lightweight queries and hard problems without switching models or endpoints. - Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use o1
Best For
- Agentic pipelines: Function calling and structured outputs make o1 a complete agent backbone that combines deep reasoning with tool calls
- Mathematical problem solving: Multi-step proofs and quantitative analysis requiring verified chain-of-thought
- Complex debugging: Architecture review where the model benefits from working through multiple approaches before committing
- Mixed-difficulty workloads:
reasoning_effortlets you optimize cost per request without switching models - Visual reasoning: Interpreting charts, diagrams, or handwritten notation as part of a larger analytical problem
Consider Alternatives When
- Conversational or generative tasks: GPT-4o or GPT-4.1 respond faster and more cheaply when extended chain-of-thought isn't needed
- Cost-sensitive STEM reasoning: O1-mini offers nearly equivalent math and coding scores at lower cost
- Low-latency streaming: Reasoning token generation introduces inherent latency that real-time responses can't tolerate
Conclusion
The production o1 is a reasoning model that fits into real systems. Function calling, structured outputs, vision, system messages, a larger context window, and per-request reasoning control make it suitable for deployment. If your application needs a model that reasons carefully and then acts on its conclusions, route it through AI Gateway.