o1
o1 is the production reasoning model that combines extended chain-of-thought computation with full tool support, structured outputs, vision, and a reasoning_effort parameter, delivering deeper problem-solving at 60% fewer reasoning tokens than o1-preview.
import { streamText } from 'ai'
const result = streamText({ model: 'openai/o1', prompt: 'Why is the sky blue?'})Frequently Asked Questions
How does
reasoning_effortaffect cost and latency in practice?Lower effort values reduce the number of reasoning tokens the model generates before answering, which directly lowers both cost (reasoning tokens are billed as output) and time-to-first-token. A pipeline that sets low effort for simple queries and high effort for complex ones can cut aggregate reasoning spend significantly.
What production capabilities does o1 have that the preview lacked?
Function calling for tool use, developer system messages for behavioral control, Structured Outputs via JSON schema constrained decoding, and vision input for image reasoning. The preview supported none of these.
How much did the context window expand from the preview?
The preview offered 128K tokens. The production o1 supports 200K tokens, enabling substantially longer documents, conversation histories, and multi-source inputs in reasoning tasks.
Can o1 be used as the reasoning layer in an agent that calls external APIs?
Yes. Function calling support means o1 can invoke tools mid-reasoning, receive results, and incorporate them into its chain of thought. Combined with Structured Outputs, it can produce machine-readable action plans that downstream orchestrators consume directly.
Why does o1 use 60% fewer reasoning tokens than the preview?
OpenAI optimized the production model's reasoning efficiency. It reaches equivalent quality conclusions with fewer intermediate steps, which translates to lower per-request cost and faster responses.
Is o1 appropriate for every query in a general-purpose chatbot?
No. Reasoning token generation adds latency and cost that is wasted on simple questions. For general chat, GPT-4o or GPT-4.1 are faster and cheaper. Reserve o1 for the subset of queries that genuinely benefit from extended deliberation, or use
reasoning_effortat a low setting to triage.