GPT-4o mini
GPT-4o mini is OpenAI's cost-efficient multimodal model, priced at $0.15 per million input tokens, at reduced cost compared to GPT-3.5 Turbo, while outperforming GPT-4 on chat preference benchmarks and supporting vision and function calling.
import { streamText } from 'ai'
const result = streamText({ model: 'openai/gpt-4o-mini', prompt: 'Why is the sky blue?'})What To Consider When Choosing a Provider
Zero Data Retention
AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.Authentication
AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
For applications that chain multiple model calls (classify, then extract, then format), GPT-4o mini's per-call cost makes it practical to run several sequential inferences per user request without the economics becoming prohibitive.
When to Use GPT-4o mini
Best For
Customer support chatbots:
Live interaction features requiring fast, affordable multi-turn responses
Multi-call pipelines:
Sequential or parallel model calls per user action where per-call cost accumulates quickly
Budget vision workflows:
Image description, document OCR assistance, and visual classification at the small-model tier
Function-calling agents:
Reliable tool invocation at low cost per call
Large conversation histories:
Processing codebases and extended chats within the context window of 128K tokens at minimal cost
Consider Alternatives When
Higher quality ceiling:
GPT-4o or GPT-4.1 handle complex reasoning, nuanced writing, or difficult coding tasks better
Advanced multimodal processing:
More capable vision or audio workloads require a larger model
Deep chain-of-thought:
O1-mini is purpose-built for extended reasoning
Conclusion
GPT-4o mini arrived as the model that made it economically viable to embed language model capability into every layer of an application, not just the final user-facing response, but classification, routing, extraction, and tool-use steps throughout a pipeline. Its combination of low price, multimodal input, function calling, and a context window of 128K tokens covers the majority of high-volume production use cases through AI Gateway.
FAQ
Pricing appears on this page and updates as providers adjust their rates. AI Gateway routes traffic through the configured provider.
Yes. It supports vision alongside text, enabling image analysis, document processing, and visual classification at the small-model cost tier.
82.0% on MMLU, outperforming comparable small models and topping GPT-4 on the LMSYS Chatbot Arena chat preference leaderboard at launch.
Yes. Function calling is supported, and OpenAI highlighted agentic pipelines that call external APIs as one of the key intended use cases.
128K tokens, providing ample space for conversation histories, long codebases, and extended document processing.
The alias gpt-4o-mini points to the current recommended version and may be updated. The dated snapshot gpt-4o-mini-2024-07-18 is pinned to the specific July 18, 2024 release.
This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.