OpenAI released o1 on December 5, 2024 as o1-2024-12-17. This is the point where OpenAI's reasoning architecture became production-ready. The September 2024 preview proved the concept: chain-of-thought reasoning scoring 83% on International Mathematical Olympiad (IMO) qualifying problems. But it shipped without the API features production systems depend on. The production o1 fills those gaps.
Function calling means o1 can participate in agentic workflows: querying databases, hitting APIs, and invoking tools mid-reasoning. Structured Outputs via constrained JSON schema decoding let downstream systems consume responses without fragile parsing. Developer system messages restore the ability to set behavioral constraints and context. Vision input enables reasoning over images, circuit diagrams, mathematical notation in photographs, and charts that require interpretation.
The efficiency gains are equally significant. o1 uses 60% fewer reasoning tokens on average compared to o1-preview for equivalent quality. Fewer reasoning tokens means lower cost per request and shorter time-to-first-token. The context window of 200K tokens (expanded from the preview's 128K) accommodates the longer inputs that complex reasoning tasks demand.
The reasoning_effort parameter, unique to the production o1, controls how deeply the model thinks. Set it low for questions where a quick chain of thought suffices. Set it high for problems that genuinely require extended deliberation. In a pipeline mixing easy and hard queries, this single parameter can cut aggregate reasoning token spend substantially.