Skip to content

Gemini 3 Flash

Gemini 3 Flash delivers Gemini 3's pro-grade reasoning at flash-level latency and cost, using 30% fewer tokens than previous Gemini 2.5 models while outperforming them across most benchmarks.

ReasoningTool UseFile InputVision (Image)Web Searchtiered-costImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'google/gemini-3-flash',
prompt: 'Why is the sky blue?'
})

Frequently Asked Questions

  • What makes Gemini 3 Flash different from Gemini 2.5 Flash?

    Gemini 3 Flash is built on the newer Gemini 3 architecture rather than Gemini 2.5. The generation change brings a substantial capability lift: Gemini 3 Flash surpasses Gemini 2.5 Pro on most benchmarks, so a speed-tier model in the 3 generation now exceeds the previous generation's flagship.

  • Can I control how much the model thinks before answering?

    Yes. You can set thinkingLevel (e.g., 'high') and includeThoughts: true inside providerOptions.google when using the AI SDK. This gives you visibility into intermediate reasoning steps.

  • Does Gemini 3 Flash support streaming?

    Yes. Use streamText from the AI SDK with model: 'google/gemini-3-flash' for streaming responses.

  • Do I need a Google Cloud account to use this model on AI Gateway?

    No. AI Gateway handles all provider authentication. You authenticate to AI Gateway using a Vercel API key or OIDC token and do not need to configure Google credentials separately.

  • How does Gemini 3 Flash compare to Gemini 3 Pro on reasoning tasks?

    Gemini 3 Pro targets the most challenging reasoning and agentic workflows. Gemini 3 Flash prioritizes speed and cost while still delivering pro-grade quality. The right tradeoff depends on your latency budget and task complexity.

  • What is Zero Data Retention and does Gemini 3 Flash support it?

    Yes, Zero Data Retention is available for this model. ZDR on AI Gateway applies to direct gateway requests; BYOK flows aren't covered. See https://vercel.com/docs/ai-gateway/capabilities/zdr for configuration details.

  • What token efficiency improvements does Gemini 3 Flash offer?

    Gemini 3 Flash uses 30% fewer tokens than previous Gemini 2.5 models. Combined with lower per-token pricing, this results in meaningful cost reductions at scale for applications processing large volumes of requests.

  • Is Gemini 3 Flash suitable for agentic multi-step workflows?

    Yes. The model's combination of reasoning capability, token efficiency, and low latency makes it well-suited for agents that execute multiple tool calls or reasoning steps in sequence within a budget-constrained environment.