Skip to content

GPT-5 nano

GPT-5 nano is the fastest and most affordable model in the GPT-5 family, designed for high-throughput, low-latency tasks like classification, routing, autocomplete, and lightweight inference at scale.

File InputReasoningTool UseVision (Image) Image GenImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'openai/gpt-5-nano',
prompt: 'Why is the sky blue?'
})

Playground

Try out GPT-5 nano by OpenAI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Azure
Legal:Terms
Privacy
400K
4.5s
$0.05/M$0.40/M
Read:$0.01/M
Write:
$14/K
+ input costs
08/07/2025
OpenAI
Legal:Terms
Privacy
400K
7.5s
121tps
$0.05/M$0.40/M
Read:$0.01/M
Write:
$10.00/K
+ input costs
+1
08/07/2025
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by OpenAI

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
3.2s
68tps
$5.00/M
$30.00/M
Read:
$0.5/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
04/24/2026
400K
1.4s
257tps
$0.75/M$4.50/M
Read:$0.07/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
03/17/2026
400K
0.4s
21tps
$0.20/M$1.25/M
Read:$0.02/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
03/17/2026
1.1M
0.9s
59tps
$2.50/M
$15.00/M
Read:
$0.25/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
03/05/2026
128K
0.6s
108tps
$1.25/M$10.00/M
Read:$0.13/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
11/12/2025
131K
0.1s
928tps
$0.35/M$0.75/M
Read:$0.25/M
Write:
baseten logo
bedrock logo
cerebras logo
+5
08/05/2025

About GPT-5 nano

GPT-5 nano was released on August 7, 2025 as the entry-level tier of the GPT-5 model family. It's optimized for the highest throughput and lowest latency in the family, targeting workloads where speed and cost matter more than reasoning depth.

Despite being the smallest GPT-5 variant, GPT-5 nano benefits from the family's architectural improvements. It handles classification, routing, extraction, and simple generation tasks with quality that reflects the generational leap from GPT-4.1 nano. The context window of 400K tokens is notable for a model at this tier, enabling it to process long inputs even when outputs remain short.

The model is designed to serve as a building block in larger systems: classifying incoming requests, routing them to appropriate handlers, extracting key fields from documents, and providing instant responses for simple queries, all at a cost that makes per-request inference viable for the highest-traffic applications.

What To Consider When Choosing a Provider

  • Configuration: GPT-5 nano prioritizes throughput and latency over reasoning depth. It's the right choice when you need fast answers to simple questions at minimal cost.
  • Configuration: At its price point, GPT-5 nano is practical as a classifier, router, or preprocessor that runs on every request, deciding which downstream model or action to invoke.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use GPT-5 nano

Best For

  • Real-time classification: Sentiment analysis, intent detection, and topic labeling at high request volume
  • Routing and triage: Deciding which model or workflow handles each incoming request
  • Autocomplete and suggestions: Sub-second inline suggestions in editors and search interfaces
  • Lightweight extraction: Pulling specific fields from structured or semi-structured text
  • Cost-sensitive batch processing: Millions of simple inferences at minimal aggregate cost

Consider Alternatives When

  • Complex reasoning needed: GPT-5 mini or GPT-5 for tasks requiring multi-step analysis
  • Code generation: Codex mini or GPT-5 codex for coding-specific tasks
  • Deep deliberation: O3 or o4-mini for problems that benefit from chain-of-thought reasoning
  • Rich multimodal analysis: Full GPT-5 for complex vision and document understanding tasks

Conclusion

GPT-5 nano brings GPT-5 family improvements to the fastest and most affordable tier, making it the right choice for classification, routing, and high-throughput lightweight tasks through AI Gateway.

Frequently Asked Questions

  • What tasks is GPT-5 nano designed for?

    Classification, routing, autocomplete, lightweight extraction, and any high-volume workload where speed and cost outweigh the need for deep reasoning.

  • How does GPT-5 nano compare to GPT-4.1 nano?

    GPT-5 nano is the next generation, inheriting GPT-5 family improvements in quality and instruction following while maintaining the speed and cost profile expected of a nano-tier model.

  • What context window does GPT-5 nano support?

    400K tokens, which is substantial for a model at this price and speed tier.

  • Can GPT-5 nano handle long documents?

    It can read long inputs within its window of 400K tokens, but it's optimized for short outputs. For detailed analysis of long documents, consider GPT-5 mini or GPT-5.

  • How does AI Gateway handle authentication for GPT-5 nano?

    AI Gateway accepts a single API key or OIDC token for all requests. You don't embed OpenAI credentials in your application; AI Gateway routes and authenticates on your behalf.

  • What are typical latency characteristics?

    This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.