Skip to content

GPT-4 Turbo

GPT-4 Turbo launched at OpenAI DevDay 2023 with a context window of 128K tokens, built-in vision, JSON mode, and a knowledge cutoff of April 2023, all at reduced input prices compared to the original GPT-4.

Tool UseVision (Image)
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'openai/gpt-4-turbo',
prompt: 'Why is the sky blue?'
})

Playground

Try out GPT-4 Turbo by OpenAI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
OpenAI
Legal:Terms
Privacy
128K
1.0s
30tps
$10.00/M$30.00/M
11/06/2023
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by OpenAI

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
2.1s
61tps
$5.00/M
$30.00/M
Read:
$0.5/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
04/24/2026
400K
1.6s
209tps
$0.75/M$4.50/M
Read:$0.07/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
03/17/2026
400K
0.4s
54tps
$0.20/M$1.25/M
Read:$0.02/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
03/17/2026
1.1M
1.0s
65tps
$2.50/M
$15.00/M
Read:
$0.25/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
03/05/2026
128K
0.6s
105tps
$1.25/M$10.00/M
Read:$0.13/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
11/12/2025
131K
0.1s
1241tps
$0.35/M$0.75/M
Read:$0.25/M
Write:
baseten logo
bedrock logo
cerebras logo
+5
08/05/2025

About GPT-4 Turbo

GPT-4 Turbo was announced at OpenAI's first DevDay conference on November 6, 2023. It introduced a context window of 128K tokens, enough to hold more than 300 pages of text in a single prompt. OpenAI also pushed the knowledge cutoff to April 2023, a meaningful update for applications dealing with events from the first half of that year.

Two features made GPT-4 Turbo immediately practical for production integrations. JSON mode let developers reliably request structured JSON output, simplifying downstream parsing without brittle prompt engineering. Vision input enabled image analysis directly within the Chat Completions API. Passing a URL or base64-encoded image let the model generate captions, extract data from photographs, and interpret diagrams or documents with figures.

Pricing was also reduced compared to the original GPT-4, with lower rates on both input and output tokens. This made GPT-4-class reasoning accessible to a much wider range of applications and unlocked use cases where the economics of the original GPT-4 had been prohibitive.

What To Consider When Choosing a Provider

  • Configuration: If you built integrations around GPT-4 Turbo's specific JSON mode behavior or vision API surface, verify provider-level feature parity before routing production traffic. Some capabilities behave consistently across providers while others may have provider-specific nuances.
  • Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use GPT-4 Turbo

Best For

  • Long-document analysis: Legal review or research tasks that benefit from holding an entire lengthy document in context at once
  • Reliable JSON output: Applications requiring structured responses through the built-in JSON mode
  • Vision-capable pipelines: Image captioning, document processing with figures, and screenshot analysis
  • Current-events workflows: Use cases that depend on knowledge of events up to April 2023
  • Cost-efficient GPT-4 reasoning: Complex reasoning where GPT-4 class depth is required at a cost tier below the original GPT-4

Consider Alternatives When

  • Larger context needed: You need a window beyond 128K tokens and GPT-4.1 supports up to 1M tokens
  • Native audio required: GPT-4o handles audio input and output natively
  • Later snapshot features: You want structured outputs guarantees or creative writing enhancements introduced in later GPT-4o snapshots
  • Cost-driven workloads: GPT-4o or GPT-4o mini provide sufficient quality at lower cost

Conclusion

GPT-4 Turbo marked the point where GPT-4 class reasoning became broadly economical, pairing a context window of 128K tokens and vision input with reduced pricing for production deployment. For document-heavy, vision-enabled, or JSON-structured workloads that were designed around its specific feature set, it remains a well-understood and reliable option through AI Gateway.

Frequently Asked Questions

  • What made GPT-4 Turbo's context window significant?

    At 128K tokens it holds roughly 300 pages of text, enabling workflows like full-codebase review, long legal document analysis, and extended multi-session conversation history without chunking.

  • How does JSON mode in GPT-4 Turbo work?

    Set response_format to { type: "json_object" } and the model constrains itself to produce valid JSON. This differs from the stricter JSON Schema-based Structured Outputs introduced later with gpt-4o-2024-08-06.

  • Does GPT-4 Turbo support image inputs?

    Yes. You can pass image URLs or base64-encoded images alongside text in the messages array. The model can analyze photographs, diagrams, screenshots, and documents with embedded figures.

  • What is GPT-4 Turbo's knowledge cutoff?

    April 2023. This was an update from earlier GPT-4 models and makes the model aware of events from the first half of 2023.

  • How does GPT-4 Turbo pricing compare to the original GPT-4?

    Rates are listed on this page. They reflect the providers routing through AI Gateway and shift when providers update their pricing.

  • Can I route GPT-4 Turbo requests through AI Gateway without storing provider API keys?

    Yes. AI Gateway handles authentication using its own API key or OIDC token system, so you don't need to embed OpenAI credentials in your deployment environment.

  • What are typical latency characteristics?

    This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.