GPT-4.1 nano

GPT-4.1 nano is the smallest and fastest model in the GPT-4.1 family, designed for high-volume, low-latency tasks like classification, autocomplete, and routing, delivering strong results on MMLU at the lowest price point in the GPT-4.1 lineup.

File InputTool UseVision (Image)Implicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'openai/gpt-4.1-nano',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

About GPT-4.1 nano

GPT-4.1 nano was introduced on April 14, 2025 as the smallest and most latency-optimized model in the GPT-4.1 family. OpenAI designed it specifically for tasks where speed and cost efficiency take priority over frontier reasoning depth: classification, autocomplete, routing decisions, and other lightweight inference workloads that need to run at high volume.

Despite being the entry-level tier of the GPT-4.1 family, GPT-4.1 nano posts creditable benchmark scores for its size: 80.1% on MMLU (Massive Multitask Language Understanding) and 50.3% on GPQA (Graduate-Level Google-Proof Q&A). These numbers show that the GPT-4.1 training improvements carried down to the smallest variant. Like its larger siblings, it supports the full context window of 1.0M tokens, which is a notable capability for a model at its price point and enables it to handle tasks that involve reading long inputs even if the outputs remain short.

GPT-4.1 nano inherits the GPT-4.1 family's 75% prompt caching discount and the removal of surcharges for long-context usage. For applications that preload a large knowledge base or system prompt once and then issue many rapid short queries against it, these economics make nano an attractive option for the query stage of a retrieval-augmented pipeline.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

GPT-4.1 nano

About GPT-4.1 nano