o1
o1 is the production reasoning model that combines extended chain-of-thought computation with full tool support, structured outputs, vision, and a `reasoning_effort` parameter, delivering deeper problem-solving at 60% fewer reasoning tokens than o1-preview.
import { streamText } from 'ai'
const result = streamText({ model: 'openai/o1', prompt: 'Why is the sky blue?'})What To Consider When Choosing a Provider
Zero Data Retention
AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.Authentication
AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
The reasoning_effort parameter lets you dial reasoning depth up or down per request. A single deployment can handle both lightweight queries and hard problems without switching models or endpoints.
When to Use o1
Best For
Agentic pipelines:
Function calling and structured outputs make o1 a complete agent backbone that combines deep reasoning with tool calls
Mathematical problem solving:
Multi-step proofs and quantitative analysis requiring verified chain-of-thought
Complex debugging:
Architecture review where the model benefits from working through multiple approaches before committing
Mixed-difficulty workloads:
reasoning_effortlets you optimize cost per request without switching modelsVisual reasoning:
Interpreting charts, diagrams, or handwritten notation as part of a larger analytical problem
Consider Alternatives When
Conversational or generative tasks:
GPT-4o or GPT-4.1 respond faster and more cheaply when extended chain-of-thought isn't needed
Cost-sensitive STEM reasoning:
O1-mini offers nearly equivalent math and coding scores at lower cost
Low-latency streaming:
Reasoning token generation introduces inherent latency that real-time responses can't tolerate
Conclusion
The production o1 is a reasoning model that fits into real systems. Function calling, structured outputs, vision, system messages, a larger context window, and per-request reasoning control make it suitable for deployment. If your application needs a model that reasons carefully and then acts on its conclusions, route it through AI Gateway.
FAQ
Lower effort values reduce the number of reasoning tokens the model generates before answering, which directly lowers both cost (reasoning tokens are billed as output) and time-to-first-token. A pipeline that sets low effort for simple queries and high effort for complex ones can cut aggregate reasoning spend significantly.
Function calling for tool use, developer system messages for behavioral control, Structured Outputs via JSON schema constrained decoding, and vision input for image reasoning. The preview supported none of these.
The preview offered 128K tokens. The production o1 supports 200K tokens, enabling substantially longer documents, conversation histories, and multi-source inputs in reasoning tasks.
Yes. Function calling support means o1 can invoke tools mid-reasoning, receive results, and incorporate them into its chain of thought. Combined with Structured Outputs, it can produce machine-readable action plans that downstream orchestrators consume directly.
OpenAI optimized the production model's reasoning efficiency. It reaches equivalent quality conclusions with fewer intermediate steps, which translates to lower per-request cost and faster responses.
No. Reasoning token generation adds latency and cost that is wasted on simple questions. For general chat, GPT-4o or GPT-4.1 are faster and cheaper. Reserve o1 for the subset of queries that genuinely benefit from extended deliberation, or use reasoning_effort at a low setting to triage.