Skip to content

GPT-4.1 nano

GPT-4.1 nano is the smallest and fastest model in the GPT-4.1 family, designed for high-volume, low-latency tasks like classification, autocomplete, and routing, delivering strong results on MMLU at the lowest price point in the GPT-4.1 lineup.

File InputTool UseVision (Image)Implicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'openai/gpt-4.1-nano',
prompt: 'Why is the sky blue?'
})

Frequently Asked Questions

  • What tasks is GPT-4.1 nano specifically designed for?

    OpenAI designed it for classification, autocomplete, and routing where response speed and low cost outweigh the need for frontier reasoning.

  • Does GPT-4.1 nano really support a context window of 1.0M tokens?

    Yes. All three GPT-4.1 family members share the context window of 1.0M tokens, which is unusual for a model at nano's price and speed tier.

  • What benchmark scores does GPT-4.1 nano achieve?

    At launch, GPT-4.1 nano scored 80.1% on MMLU and 50.3% on GPQA, showing that the family's training improvements extended to the smallest variant.

  • How does GPT-4.1 nano's pricing compare to the rest of the GPT-4.1 family?

    See the pricing section on this page for today's rates. AI Gateway exposes each provider's pricing for GPT-4.1 nano.

  • Is GPT-4.1 nano suitable as the query model in a RAG pipeline?

    Yes. Pairing a large preloaded knowledge base or system prompt (benefiting from the 75% cache discount) with rapid, inexpensive nano queries is a practical pattern for retrieval-augmented generation at scale.

  • When should I use nano versus mini versus GPT-4.1?

    Nano: classification, routing, autocomplete. Mini: GPT-4o-class quality with lower cost and latency. GPT-4.1: maximum coding and instruction-following accuracy.

  • What are typical latency characteristics?

    This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.