GPT-4.1
GPT-4.1 is OpenAI's April 2025 general-purpose API model, purpose-built for coding and instruction following with a context window of 1.0M tokens, a 21-point SWE-bench gain over GPT-4o, and a 75% prompt caching discount, at a lower cost than its predecessor.
import { streamText } from 'ai'
const result = streamText({ model: 'openai/gpt-4.1', prompt: 'Why is the sky blue?'})Frequently Asked Questions
How significant is the SWE-bench improvement?
GPT-4.1 improved 21.4 points over GPT-4o and 26.6 points over GPT-4.5 on SWE-bench Verified. This was the largest coding benchmark gain in a single OpenAI model release at the time.
What does the context window of 1.0M tokens mean in practice?
You can pass an entire large codebase, a full legal document set, or a long multi-session conversation history in a single request. GPT-4.1 retrieves information accurately at all positions within that range.
How does prompt caching work with GPT-4.1?
Repeated input tokens, such as system prompts or shared context, are cached at a 75% discount off the standard input token price. This is especially valuable for agentic loops that resend the same context on every iteration.
What is GPT-4.1's knowledge cutoff?
June 2024, updated from GPT-4o's earlier cutoff.
Does GPT-4.1 handle video input?
Yes. It accepts text, images, and video, and scored 72.0% on Video-MME's long-context, no-subtitles benchmark.
What are typical latency characteristics?
This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.