Skip to content

GPT-4.1

GPT-4.1 is OpenAI's April 2025 general-purpose API model, purpose-built for coding and instruction following with a context window of 1.0M tokens, a 21-point SWE-bench gain over GPT-4o, and a 75% prompt caching discount, at a lower cost than its predecessor.

File InputTool UseVision (Image)
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'openai/gpt-4.1',
prompt: 'Why is the sky blue?'
})

Frequently Asked Questions

  • How significant is the SWE-bench improvement?

    GPT-4.1 improved 21.4 points over GPT-4o and 26.6 points over GPT-4.5 on SWE-bench Verified. This was the largest coding benchmark gain in a single OpenAI model release at the time.

  • What does the context window of 1.0M tokens mean in practice?

    You can pass an entire large codebase, a full legal document set, or a long multi-session conversation history in a single request. GPT-4.1 retrieves information accurately at all positions within that range.

  • How does prompt caching work with GPT-4.1?

    Repeated input tokens, such as system prompts or shared context, are cached at a 75% discount off the standard input token price. This is especially valuable for agentic loops that resend the same context on every iteration.

  • What is GPT-4.1's knowledge cutoff?

    June 2024, updated from GPT-4o's earlier cutoff.

  • Does GPT-4.1 handle video input?

    Yes. It accepts text, images, and video, and scored 72.0% on Video-MME's long-context, no-subtitles benchmark.

  • What are typical latency characteristics?

    This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.