Skip to content

o1

o1 is the production reasoning model that combines extended chain-of-thought computation with full tool support, structured outputs, vision, and a reasoning_effort parameter, delivering deeper problem-solving at 60% fewer reasoning tokens than o1-preview.

File InputReasoningTool UseVision (Image)Implicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'openai/o1',
prompt: 'Why is the sky blue?'
})

Playground

Try out o1 by OpenAI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Azure
Legal:Terms
Privacy
200K
1.0s
$15.00/M$60.00/M
Read:$7.5/M
Write:
$14/K
+ input costs
12/05/2024
OpenAI
Legal:Terms
Privacy
200K
0.8s
156tps
$15.00/M$60.00/M
Read:$7.5/M
Write:
12/05/2024
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by OpenAI

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
3.2s
45tps
$5.00/M
$30.00/M
Read:
$0.5/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
04/24/2026
400K
1.5s
235tps
$0.75/M$4.50/M
Read:$0.07/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
03/17/2026
400K
0.4s
19tps
$0.20/M$1.25/M
Read:$0.02/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
03/17/2026
1.1M
0.9s
65tps
$2.50/M
$15.00/M
Read:
$0.25/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
03/05/2026
128K
0.5s
108tps
$1.25/M$10.00/M
Read:$0.13/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
11/12/2025
131K
0.1s
241tps
$0.35/M$0.75/M
Read:$0.25/M
Write:
baseten logo
bedrock logo
cerebras logo
+5
08/05/2025

About o1

OpenAI released o1 on December 5, 2024 as o1-2024-12-17. This is the point where OpenAI's reasoning architecture became production-ready. The September 2024 preview proved the concept: chain-of-thought reasoning scoring 83% on International Mathematical Olympiad (IMO) qualifying problems. But it shipped without the API features production systems depend on. The production o1 fills those gaps.

Function calling means o1 can participate in agentic workflows: querying databases, hitting APIs, and invoking tools mid-reasoning. Structured Outputs via constrained JSON schema decoding let downstream systems consume responses without fragile parsing. Developer system messages restore the ability to set behavioral constraints and context. Vision input enables reasoning over images, circuit diagrams, mathematical notation in photographs, and charts that require interpretation.

The efficiency gains are equally significant. o1 uses 60% fewer reasoning tokens on average compared to o1-preview for equivalent quality. Fewer reasoning tokens means lower cost per request and shorter time-to-first-token. The context window of 200K tokens (expanded from the preview's 128K) accommodates the longer inputs that complex reasoning tasks demand.

The reasoning_effort parameter, unique to the production o1, controls how deeply the model thinks. Set it low for questions where a quick chain of thought suffices. Set it high for problems that genuinely require extended deliberation. In a pipeline mixing easy and hard queries, this single parameter can cut aggregate reasoning token spend substantially.

What To Consider When Choosing a Provider

  • Configuration: The reasoning_effort parameter lets you dial reasoning depth up or down per request. A single deployment can handle both lightweight queries and hard problems without switching models or endpoints.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use o1

Best For

  • Agentic pipelines: Function calling and structured outputs make o1 a complete agent backbone that combines deep reasoning with tool calls
  • Mathematical problem solving: Multi-step proofs and quantitative analysis requiring verified chain-of-thought
  • Complex debugging: Architecture review where the model benefits from working through multiple approaches before committing
  • Mixed-difficulty workloads: reasoning_effort lets you optimize cost per request without switching models
  • Visual reasoning: Interpreting charts, diagrams, or handwritten notation as part of a larger analytical problem

Consider Alternatives When

  • Conversational or generative tasks: GPT-4o or GPT-4.1 respond faster and more cheaply when extended chain-of-thought isn't needed
  • Cost-sensitive STEM reasoning: O1-mini offers nearly equivalent math and coding scores at lower cost
  • Low-latency streaming: Reasoning token generation introduces inherent latency that real-time responses can't tolerate

Conclusion

The production o1 is a reasoning model that fits into real systems. Function calling, structured outputs, vision, system messages, a larger context window, and per-request reasoning control make it suitable for deployment. If your application needs a model that reasons carefully and then acts on its conclusions, route it through AI Gateway.

Frequently Asked Questions

  • How does reasoning_effort affect cost and latency in practice?

    Lower effort values reduce the number of reasoning tokens the model generates before answering, which directly lowers both cost (reasoning tokens are billed as output) and time-to-first-token. A pipeline that sets low effort for simple queries and high effort for complex ones can cut aggregate reasoning spend significantly.

  • What production capabilities does o1 have that the preview lacked?

    Function calling for tool use, developer system messages for behavioral control, Structured Outputs via JSON schema constrained decoding, and vision input for image reasoning. The preview supported none of these.

  • How much did the context window expand from the preview?

    The preview offered 128K tokens. The production o1 supports 200K tokens, enabling substantially longer documents, conversation histories, and multi-source inputs in reasoning tasks.

  • Can o1 be used as the reasoning layer in an agent that calls external APIs?

    Yes. Function calling support means o1 can invoke tools mid-reasoning, receive results, and incorporate them into its chain of thought. Combined with Structured Outputs, it can produce machine-readable action plans that downstream orchestrators consume directly.

  • Why does o1 use 60% fewer reasoning tokens than the preview?

    OpenAI optimized the production model's reasoning efficiency. It reaches equivalent quality conclusions with fewer intermediate steps, which translates to lower per-request cost and faster responses.

  • Is o1 appropriate for every query in a general-purpose chatbot?

    No. Reasoning token generation adds latency and cost that is wasted on simple questions. For general chat, GPT-4o or GPT-4.1 are faster and cheaper. Reserve o1 for the subset of queries that genuinely benefit from extended deliberation, or use reasoning_effort at a low setting to triage.