GPT-4.1 nano was introduced on April 14, 2025 as the smallest and most latency-optimized model in the GPT-4.1 family. OpenAI designed it specifically for tasks where speed and cost efficiency take priority over frontier reasoning depth: classification, autocomplete, routing decisions, and other lightweight inference workloads that need to run at high volume.
Despite being the entry-level tier of the GPT-4.1 family, GPT-4.1 nano posts creditable benchmark scores for its size: 80.1% on MMLU (Massive Multitask Language Understanding) and 50.3% on GPQA (Graduate-Level Google-Proof Q&A). These numbers show that the GPT-4.1 training improvements carried down to the smallest variant. Like its larger siblings, it supports the full context window of 1.0M tokens, which is a notable capability for a model at its price point and enables it to handle tasks that involve reading long inputs even if the outputs remain short.
GPT-4.1 nano inherits the GPT-4.1 family's 75% prompt caching discount and the removal of surcharges for long-context usage. For applications that preload a large knowledge base or system prompt once and then issue many rapid short queries against it, these economics make nano an attractive option for the query stage of a retrieval-augmented pipeline.