GPT-4.1 nano
GPT-4.1 nano is the smallest and fastest model in the GPT-4.1 family, designed for high-volume, low-latency tasks like classification, autocomplete, and routing, delivering strong results on MMLU at the lowest price point in the GPT-4.1 lineup.
import { streamText } from 'ai'
const result = streamText({ model: 'openai/gpt-4.1-nano', prompt: 'Why is the sky blue?'})Frequently Asked Questions
What tasks is GPT-4.1 nano specifically designed for?
OpenAI designed it for classification, autocomplete, and routing where response speed and low cost outweigh the need for frontier reasoning.
Does GPT-4.1 nano really support a context window of 1.0M tokens?
Yes. All three GPT-4.1 family members share the context window of 1.0M tokens, which is unusual for a model at nano's price and speed tier.
What benchmark scores does GPT-4.1 nano achieve?
At launch, GPT-4.1 nano scored 80.1% on MMLU and 50.3% on GPQA, showing that the family's training improvements extended to the smallest variant.
How does GPT-4.1 nano's pricing compare to the rest of the GPT-4.1 family?
See the pricing section on this page for today's rates. AI Gateway exposes each provider's pricing for GPT-4.1 nano.
Is GPT-4.1 nano suitable as the query model in a RAG pipeline?
Yes. Pairing a large preloaded knowledge base or system prompt (benefiting from the 75% cache discount) with rapid, inexpensive nano queries is a practical pattern for retrieval-augmented generation at scale.
When should I use nano versus mini versus GPT-4.1?
Nano: classification, routing, autocomplete. Mini: GPT-4o-class quality with lower cost and latency. GPT-4.1: maximum coding and instruction-following accuracy.
What are typical latency characteristics?
This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.