Skip to content

Gemini 3 Flash

Gemini 3 Flash delivers Gemini 3's pro-grade reasoning at flash-level latency and cost, using 30% fewer tokens than previous Gemini 2.5 models while outperforming them across most benchmarks.

ReasoningTool UseFile InputVision (Image)Web Searchtiered-costImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'google/gemini-3-flash',
prompt: 'Why is the sky blue?'
})

Playground

Try out Gemini 3 Flash by Google. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Google Vertex AI
Legal:Terms
Privacy
1M
0.9s
195tps
$0.50/M
$3.00/M
Read:
$0.05/M
Write:
$14.00/K
+ input costs
12/17/2025
Google
Legal:Terms
Privacy
1M
0.5s
162tps
$0.50/M
$3.00/M
Read:
$0.05/M
Write:
$14.00/K
+ input costs
12/17/2025
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by Google

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
0.8s
265tps
$0.25/M$1.50/M
Read:$0.03/M
Write:
$14.00/K
+ input costs
google logo
vertex logo
03/03/2026
1M
4.5s
171tps
$2.00/M
$12.00/M
Read:
$0.2/M
Write:
$14.00/K
+ input costs
google logo
vertex logo
02/19/2026
1M
0.3s
216tps
$0.10/M$0.40/M
Read:$0.01/M
Write:
$35.00/K
+ input costs
google logo
vertex logo
06/17/2025
1M
0.4s
184tps
$0.30/M$2.50/M
Read:$0.03/M
Write:
$35.00/K
+ input costs
google logo
vertex logo
03/20/2025
1M
1.9s
117tps
$1.25/M
$10.00/M
Read:
$0.13/M
Write:
$35.00/K
+ input costs
google logo
vertex logo
03/20/2025
1M
0.4s
149tps
$0.15/M$0.60/M
Read:$0.03/M
Write:
$35.00/K
+ input costs
google logo
vertex logo
12/11/2024

About Gemini 3 Flash

Gemini 3 Flash is Google's speed-optimized model in the Gemini 3 generation, combining Gemini 3's reasoning depth with the efficiency profile of the Flash tier. It significantly outperforms Gemini 2.5 Pro across most benchmarks, meaning a speed-tier model now surpasses a previous-generation flagship. It achieves this while consuming 30% fewer tokens and running at 3x the speed of its predecessors.

Thinking is first-class in Gemini 3 Flash. The thinkingLevel and includeThoughts provider options let you surface intermediate reasoning steps. This helps when debugging multi-step pipelines, constructing chain-of-thought datasets, or validating that the model reasons through a problem correctly. Set thinkingLevel to high when the task demands deeper inference and your latency budget allows it.

Because Gemini 3 Flash sits at the intersection of quality and throughput, it fits a wide range of real-world traffic patterns, from low-latency chat interfaces to batch document processing pipelines. Accessing it through AI Gateway adds observability, automatic retries, and provider failover without requiring a Google Cloud account.

What To Consider When Choosing a Provider

  • Configuration: Gemini 3 Flash supports configurable thinking levels (high included) via providerOptions, giving you direct control over how much reasoning compute the model applies per request.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Gemini 3 Flash

Best For

  • Real-time chat and assistants: Interfaces that require pro-level reasoning without high latency
  • High-volume agentic pipelines: Per-token cost directly affects operating expenses
  • Step-by-step analysis: Tasks where surfacing intermediate reasoning (includeThoughts) adds value
  • Throughput-bottlenecked apps: Applications previously constrained by Gemini 2.5 Pro throughput limits
  • Cost-sensitive production workloads: Production traffic where per-token cost matters but quality still has to stay benchmark-competitive

Consider Alternatives When

  • Maximum reasoning depth: Your task requires the deepest reasoning regardless of cost or speed (consider google/gemini-3-pro-preview or google/gemini-3.1-pro-preview)
  • Native image generation needed: You require image output alongside text (consider google/gemini-3-pro-image or google/gemini-3.1-flash-image-preview)
  • Budget and latency dominate: Task quality requirements are low (consider google/gemini-3.1-flash-lite-preview)

Conclusion

Gemini 3 Flash resets expectations for what a speed-tier model can deliver, matching or exceeding previous-generation Pro quality at a fraction of the cost and latency. For teams that need scalable intelligence rather than raw capability, it represents a cost- and latency-efficient entry point into the Gemini 3 generation on AI Gateway.

Frequently Asked Questions

  • What makes Gemini 3 Flash different from Gemini 2.5 Flash?

    Gemini 3 Flash is built on the newer Gemini 3 architecture rather than Gemini 2.5. The generation change brings a substantial capability lift: Gemini 3 Flash surpasses Gemini 2.5 Pro on most benchmarks, so a speed-tier model in the 3 generation now exceeds the previous generation's flagship.

  • Can I control how much the model thinks before answering?

    Yes. You can set thinkingLevel (e.g., 'high') and includeThoughts: true inside providerOptions.google when using the AI SDK. This gives you visibility into intermediate reasoning steps.

  • Does Gemini 3 Flash support streaming?

    Yes. Use streamText from the AI SDK with model: 'google/gemini-3-flash' for streaming responses.

  • Do I need a Google Cloud account to use this model on AI Gateway?

    No. AI Gateway handles all provider authentication. You authenticate to AI Gateway using a Vercel API key or OIDC token and do not need to configure Google credentials separately.

  • How does Gemini 3 Flash compare to Gemini 3 Pro on reasoning tasks?

    Gemini 3 Pro targets the most challenging reasoning and agentic workflows. Gemini 3 Flash prioritizes speed and cost while still delivering pro-grade quality. The right tradeoff depends on your latency budget and task complexity.

  • What is Zero Data Retention and does Gemini 3 Flash support it?

    Yes, Zero Data Retention is available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.

  • What token efficiency improvements does Gemini 3 Flash offer?

    Gemini 3 Flash uses 30% fewer tokens than previous Gemini 2.5 models. Combined with lower per-token pricing, this results in meaningful cost reductions at scale for applications processing large volumes of requests.

  • Is Gemini 3 Flash suitable for agentic multi-step workflows?

    Yes. The model's combination of reasoning capability, token efficiency, and low latency makes it well-suited for agents that execute multiple tool calls or reasoning steps in sequence within a budget-constrained environment.