How significant is the SWE-bench improvement?

GPT-4.1 improved 21.4 points over GPT-4o and 26.6 points over GPT-4.5 on SWE-bench Verified. This was the largest coding benchmark gain in a single OpenAI model release at the time.

What does the context window of 1.0M tokens mean in practice?

You can pass an entire large codebase, a full legal document set, or a long multi-session conversation history in a single request. GPT-4.1 retrieves information accurately at all positions within that range.

How does prompt caching work with GPT-4.1?

Repeated input tokens, such as system prompts or shared context, are cached at a 75% discount off the standard input token price. This is especially valuable for agentic loops that resend the same context on every iteration.

What is GPT-4.1's knowledge cutoff?

June 2024, updated from GPT-4o's earlier cutoff.

Does GPT-4.1 handle video input?

Yes. It accepts text, images, and video, and scored 72.0% on Video-MME's long-context, no-subtitles benchmark.

What are typical latency characteristics?

This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.

GPT-4.1

GPT-4.1 is OpenAI's April 2025 general-purpose API model, purpose-built for coding and instruction following with a context window of 1.0M tokens, a 21-point SWE-bench gain over GPT-4o, and a 75% prompt caching discount, at a lower cost than its predecessor.

File InputTool UseVision (Image)

import { streamText } from 'ai'

const result = streamText({
  model: 'openai/gpt-4.1',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Frequently Asked Questions

How significant is the SWE-bench improvement?
GPT-4.1 improved 21.4 points over GPT-4o and 26.6 points over GPT-4.5 on SWE-bench Verified. This was the largest coding benchmark gain in a single OpenAI model release at the time.
What does the context window of 1.0M tokens mean in practice?
You can pass an entire large codebase, a full legal document set, or a long multi-session conversation history in a single request. GPT-4.1 retrieves information accurately at all positions within that range.
How does prompt caching work with GPT-4.1?
Repeated input tokens, such as system prompts or shared context, are cached at a 75% discount off the standard input token price. This is especially valuable for agentic loops that resend the same context on every iteration.
What is GPT-4.1's knowledge cutoff?
June 2024, updated from GPT-4o's earlier cutoff.
Does GPT-4.1 handle video input?
Yes. It accepts text, images, and video, and scored 72.0% on Video-MME's long-context, no-subtitles benchmark.
What are typical latency characteristics?
This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

GPT-4.1

Frequently Asked Questions