GPT-4.1
GPT-4.1 is OpenAI's April 2025 general-purpose API model, purpose-built for coding and instruction following with a context window of 1.0M tokens, a 21-point SWE-bench gain over GPT-4o, and a 75% prompt caching discount, at a lower cost than its predecessor.
import { streamText } from 'ai'
const result = streamText({ model: 'openai/gpt-4.1', prompt: 'Why is the sky blue?'})Playground
Try out GPT-4.1 by OpenAI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by OpenAI
| Model |
|---|
About GPT-4.1
GPT-4.1 arrived on April 14, 2025 alongside two smaller siblings: GPT-4.1 mini and GPT-4.1 nano. OpenAI built this release around measurable improvements in three areas rather than incremental gains across a broad benchmark suite.
Coding was the centerpiece. GPT-4.1 scored 21.4 points higher than GPT-4o on SWE-bench Verified, the benchmark that measures a model's ability to autonomously resolve real GitHub issues. It also scored 26.6 points above GPT-4.5. For teams building AI-assisted development tools, this translates to better codebase comprehension, more correct patches, and stronger adherence to repository-specific conventions.
Instruction adherence also improved substantially. On Scale AI's MultiChallenge benchmark, GPT-4.1 reached 38.3%, a 10.5-point increase over GPT-4o. This matters for any pipeline where the model must follow a multi-step specification exactly: structured data extraction, form filling, and compliance document processing.
The context window of 1.0M tokens pairs with genuine retrieval accuracy across the full range, not just nominal capacity. OpenAI also restructured pricing: GPT-4.1 costs less than GPT-4o for equivalent queries, the prompt caching discount increased to 75%, and long-context requests no longer carry surcharges. The knowledge cutoff is June 2024.
What To Consider When Choosing a Provider
- Configuration: Agentic coding pipelines that repeatedly send large system prompts or repository context benefit significantly from the 75% prompt caching discount. This is a deeper discount than earlier OpenAI models offered.
- Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use GPT-4.1
Best For
- Autonomous software engineering: Code generation, debugging, and refactoring across full repositories using the context window of 1.0M tokens
- Complex instruction following: Structured extraction, compliance workflows, and multi-constraint document processing where precision matters
- Long-document analysis: Legal contracts, research papers, and entire codebases processed in a single pass without chunking
- Multimodal workflows: Applications that combine video, images, and extended text in a single request
- Cache-heavy pipelines: Large repeated prompts where the 75% caching discount materially reduces cost
Consider Alternatives When
- Simpler tasks: GPT-4.1 mini or nano can handle straightforward work at substantially lower cost
- STEM reasoning dominant: The o-series reasoning models may yield higher accuracy on chain-of-thought workloads
- Ultra-low latency: Capability can be traded for speed when response time is the primary requirement
Conclusion
GPT-4.1 is OpenAI's general-purpose API model for code-heavy and instruction-intensive workloads. The context window of 1.0M tokens, strong SWE-bench performance, and restructured pricing make it a clear upgrade path from GPT-4o for teams that need advanced coding ability without premium cost.
Frequently Asked Questions
How significant is the SWE-bench improvement?
GPT-4.1 improved 21.4 points over GPT-4o and 26.6 points over GPT-4.5 on SWE-bench Verified. This was the largest coding benchmark gain in a single OpenAI model release at the time.
What does the context window of 1.0M tokens mean in practice?
You can pass an entire large codebase, a full legal document set, or a long multi-session conversation history in a single request. GPT-4.1 retrieves information accurately at all positions within that range.
How does prompt caching work with GPT-4.1?
Repeated input tokens, such as system prompts or shared context, are cached at a 75% discount off the standard input token price. This is especially valuable for agentic loops that resend the same context on every iteration.
What is GPT-4.1's knowledge cutoff?
June 2024, updated from GPT-4o's earlier cutoff.
Does GPT-4.1 handle video input?
Yes. It accepts text, images, and video, and scored 72.0% on Video-MME's long-context, no-subtitles benchmark.
What are typical latency characteristics?
This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.