Skip to content

Gemini 2.5 Flash

Gemini 2.5 Flash is Google's first fully hybrid reasoning model, letting developers toggle thinking on or off and set thinking budgets to tune the balance between quality, cost, and latency, all on top of the fast, multimodal foundation of 2.0 Flash.

File InputReasoningTool UseVision (Image)Web SearchImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'google/gemini-2.5-flash',
prompt: 'Why is the sky blue?'
})

Playground

Try out Gemini 2.5 Flash by Google. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Google Vertex AI
Legal:Terms
Privacy
1M
0.5s
151tps
$0.30/M$2.50/M
Read:$0.03/M
Write:
$35.00/K
+ input costs
03/20/2025
Google
Legal:Terms
Privacy
1M
0.5s
196tps
$0.30/M$2.50/M
Read:$0.03/M
Write:
$35.00/K
+ input costs
03/20/2025
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by Google

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
0.8s
247tps
$0.25/M$1.50/M
Read:$0.03/M
Write:
$14.00/K
+ input costs
google logo
vertex logo
03/03/2026
1M
3.4s
154tps
$2.00/M
$12.00/M
Read:
$0.2/M
Write:
$14.00/K
+ input costs
google logo
vertex logo
02/19/2026
1M
0.8s
183tps
$0.50/M
$3.00/M
Read:
$0.05/M
Write:
$14.00/K
+ input costs
google logo
vertex logo
12/17/2025
1M
0.4s
200tps
$0.10/M$0.40/M
Read:$0.01/M
Write:
$35.00/K
+ input costs
google logo
vertex logo
06/17/2025
1M
1.8s
126tps
$1.25/M
$10.00/M
Read:
$0.13/M
Write:
$35.00/K
+ input costs
google logo
vertex logo
03/20/2025
1M
0.5s
135tps
$0.15/M$0.60/M
Read:$0.03/M
Write:
$35.00/K
+ input costs
google logo
vertex logo
12/11/2024

About Gemini 2.5 Flash

Gemini 2.5 Flash builds directly on the 2.0 Flash foundation, carrying forward its speed and cost characteristics while adding a major reasoning upgrade. It launched in preview as Google's first fully hybrid reasoning model, a classification that sets it apart from both the 2.0 Flash generation and pure thinking models.

The hybrid design means thinking is not always on. You can disable thinking entirely to maintain 2.0 Flash response speed, or enable it and set thinking budgets to control how much deliberation the model applies before answering. With thinking on, 2.5 Flash shows meaningful performance improvements over the 2.0 generation on reasoning-intensive tasks. Its performance-to-cost ratio places it on the Pareto frontier, competitive on quality without requiring the full resource commitment of 2.5 Pro. This makes it well-suited for applications where some prompts are routine and some are complex, and you want a single model that adapts accordingly.

Gemini 2.5 Flash also integrates with tools including Google Search and code execution, and accepts multimodal input across text, images, video, and audio. The context window is 1M tokens, maintaining the long-context capability of the 2.0 Flash generation.

What To Consider When Choosing a Provider

  • Configuration: Thinking budgets affect token consumption and latency, so evaluate provider rate limits and pricing tiers with thinking-enabled requests before committing to a provider variant at production scale.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Gemini 2.5 Flash

Best For

  • Workloads with mixed complexity: Applications that serve both simple requests and hard reasoning problems benefit from the ability to set per-request thinking budgets rather than paying for full reasoning on every call
  • Reasoning-intensive pipelines: Multi-step math, science, coding, or logic tasks where 2.0 Flash's speed was sufficient but accuracy needs improvement
  • Cost-conscious agentic applications: Need chain-of-thought planning but cannot afford 2.5 Pro pricing across high request volumes
  • Coding and code transformation tasks: Benefit from the reasoning capabilities introduced in the 2.5 generation, including agentic code applications
  • Multimodal reasoning: Images, video, or audio inputs that require more nuanced analysis than pattern matching

Consider Alternatives When

  • Deepest reasoning required: Highly complex problems with no speed or cost constraint, where 2.5 Pro's stronger benchmark scores may justify the premium
  • Uniform high-volume inference: Entirely low-complexity workloads where thinking overhead adds cost without benefit, making 2.5 Flash-Lite or 2.0 Flash-Lite more appropriate
  • Native image or audio output: 2.5 Flash outputs text only, so media generation needs a different model

Conclusion

Gemini 2.5 Flash introduces a new dimension of control to the Flash model family. You can dial reasoning depth from zero to a configured budget, matching compute expenditure to actual task complexity. It retains the efficiency that made Flash popular while unlocking the reasoning quality that previously required a heavier model.

Frequently Asked Questions

  • What does "hybrid reasoning" mean for Gemini 2.5 Flash?

    It means the model operates in two modes: with thinking disabled (behaving like a fast response model comparable to 2.0 Flash) or with thinking enabled at a configurable budget, where it reasons through the problem before generating an answer.

  • How do thinking budgets work?

    You set a per-request parameter that controls how much deliberation the model applies before responding. A higher budget allows more reasoning steps, improving accuracy on complex tasks at the cost of more tokens and higher latency. A lower budget favors speed and cost.

  • If I disable thinking, how does 2.5 Flash compare to 2.0 Flash?

    2.5 Flash outperforms 2.0 Flash even with thinking disabled. The 2.5 base model is stronger regardless of thinking mode.

  • Does Gemini 2.5 Flash support Google Search tool use?

    Yes. Google Search and code execution are shared capabilities across all Gemini 2.5 models, including Gemini 2.5 Flash.

  • What is the context window for Gemini 2.5 Flash?

    The context window is 1M tokens.

  • How is 2.5 Flash positioned relative to 2.5 Pro?

    Gemini 2.5 Flash sits at the Pareto frontier of cost and performance. It delivers strong reasoning at a lower cost than 2.5 Pro, which targets complex tasks with strong benchmark scores.

  • Is Gemini 2.5 Flash generally available?

    It launched in preview on March 20, 2025. Google later promoted it to stable general availability alongside 2.5 Pro as part of the Gemini 2.5 family expansion.