GPT-4o mini launched on July 18, 2024 as OpenAI's cost-efficient model, positioned to replace GPT-3.5 Turbo for cost-sensitive deployments while providing meaningfully higher capability. The pricing stands out: $0.15 per million input tokens and $0.6 per million output tokens, at reduced cost compared to GPT-3.5 Turbo. It scored 82.0% on MMLU (Massive Multitask Language Understanding), exceeding GPT-3.5 Turbo, and topped GPT-4 on the LMSYS Chatbot Arena chat preference leaderboard at release.
GPT-4o mini supports vision alongside text, inheriting GPT-4o's multimodal design at the small-model tier. You can run cost-efficient image analysis, document processing, visual classification, and screenshot interpretation without routing to a larger model. Function calling support makes it viable as the reasoning layer in tool-using agents and API-calling pipelines.
OpenAI highlighted four patterns where GPT-4o mini excels: chaining or parallelizing multiple model calls, passing large volumes of context such as full codebases or conversation histories, fast real-time text responses for customer-facing interfaces, and workloads previously blocked by GPT-3.5 Turbo's capability ceiling. The context window of 128K tokens gives it substantial headroom for each of these.