What makes Gemini 2.0 Flash different from 1.5 Flash?

Gemini 2.0 Flash adds native multimodal output (images and steerable TTS audio), native tool use (Google Search, code execution, user-defined functions), and the Multimodal Live API for real-time streaming, while maintaining similar latency to 1.5 Flash and outperforming 1.5 Pro on key benchmarks.

What is the Multimodal Live API and does AI Gateway support it?

The Multimodal Live API is a streaming interface released alongside 2.0 Flash. It supports real-time audio and video input with combined tool use. Check AI Gateway documentation and your provider in vertex, google for current Live API support.

Can Gemini 2.0 Flash generate images and audio in the same response as text?

Yes. Gemini 2.0 Flash produces natively generated images and steerable text-to-speech audio alongside text in a single response, without requiring separate generation calls.

How does the context window of 1.0M tokens affect prompt construction?

With 1.0M tokens, you can pass entire codebases, long PDF documents, hours of transcripts, or extended conversation histories in a single context, eliminating the need to chunk or summarize inputs for most practical workloads.

What native tools can Gemini 2.0 Flash call?

Gemini 2.0 Flash supports Google Search, code execution, and third-party user-defined functions natively, enabling it to fetch live information, run and test code, and call external APIs within a single inference pass.

Is Gemini 2.0 Flash suitable for building multimodal assistant experiences?

Yes. Gemini 2.0 Flash pairs multimodal reasoning with native tool use, low latency, and multi-language conversational capabilities, making it suitable for assistant applications that combine voice, vision, and text inputs.

How does Zero Data Retention work with this model through AI Gateway?

Yes, Zero Data Retention is available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.

What safety measures are built into Gemini 2.0 Flash?

Gemini 2.0 Flash handles sensitive prompts more reliably than the previous generation and is evaluated against indirect prompt injection attacks. Google publishes detailed safety documentation alongside each Gemini release.

Gemini 2.0 Flash

Gemini 2.0 Flash is Google's workhorse model for the agentic era. It delivers low-latency multimodal output, including natively generated images and steerable text-to-speech (TTS) audio, alongside native tool use and a Multimodal Live API for real-time streaming.

File InputTool UseVision (Image)Web Search

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'google/gemini-2.0-flash',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out Gemini 2.0 Flash by Google. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

0.4s

154tps

$0.15/M

$0.60/M

Read:$0.03/M

Write:—

$35.00/K

+ input costs

—

12/11/2024

Legal:Terms

•

Privacy

$0.10/M

$0.40/M

Read:$0.03/M

Write:—

$35.00/K

+ input costs

—

12/11/2024

More models by Google

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

3.6s

260tps

$1.50/M

$9.00/M

Read:$0.15/M

Write:—

$14.00/K

+ input costs

—

05/19/2026

262K

0.5s

71tps

$0.13/M

$0.40/M

Read:$0.01/M

Write:—

—

04/02/2026

0.7s

249tps

$0.25/M

$1.50/M

Read:$0.03/M

Write:—

$14.00/K

+ input costs

—

03/03/2026

5.2s

271tps

$2.00/M

$12.00/M

Read:

$0.2/M

Write:

—

$14.00/K

+ input costs

—

02/19/2026

0.8s

173tps

$0.50/M

$3.00/M

Read:

$0.05/M

Write:

—

$14.00/K

+ input costs

—

12/17/2025

0.3s

235tps

$0.10/M

$0.40/M

Read:$0.01/M

Write:—

$35.00/K

+ input costs

—

06/17/2025

About Gemini 2.0 Flash

Google released Gemini 2.0 Flash on December 11, 2024 as the first model in the Gemini 2.0 generation, optimized for high-volume, high-frequency tasks at scale. Gemini 2.0 Flash outperforms Gemini 1.5 Pro on key benchmarks while serving at flash-tier latency. See live metrics on this page for current throughput.

Gemini 2.0 Flash stands out through native multimodal output. Beyond accepting text, images, video, and audio as input, it produces natively generated images mixed with text and steerable text-to-speech (TTS) multilingual audio. This eliminates the need for separate image-generation or speech-synthesis calls, enabling tighter integration within a single request.

Google simultaneously released the Multimodal Live API alongside Gemini 2.0 Flash for real-time and interactive applications. This API adds streaming audio and video input, combined tool use, and the low-latency response characteristics conversational agents and live-session experiences need. Gemini 2.0 Flash also supports native tool use including Google Search, code execution, and user-defined functions for multi-step agentic workflows.

The context window of 1.0M tokens handles tasks that require reasoning over large codebases, lengthy documents, or extended conversation histories in a single pass.

What To Consider When Choosing a Provider

Configuration: When selecting a provider variant, consider whether your application requires the Multimodal Live API for real-time audio and video streaming, as that capability may vary across provider endpoints.
Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Gemini 2.0 Flash

Best For

High-frequency production workloads: You need strong benchmark performance at low latency and competitive cost
Agentic applications: Require compositional function-calling, native Google Search tool use, and multi-step planning
Applications generating mixed media: Benefit from native image and TTS audio output within a single model call rather than chained services
Real-time interactive experiences: Using the Multimodal Live API for streaming audio/video input with sub-second response loops
Long-context analysis: Processing up to 1.0M tokens of text, video, images, audio, or code in a single context window

Consider Alternatives When

Deep extended reasoning: Your task demands deliberate chain-of-thought thinking, which is more central to the 2.5 generation of models
Lowest cost per token: For very simple classification or captioning tasks, a lighter model like Gemini 2.0 Flash Lite may be more appropriate
Dedicated embedding workloads: Your application requires only text embeddings and semantic retrieval, where a dedicated embedding model is a better architectural fit
Strict budget constraints: Per-request cost is above capability and quality parity with 1.5 Flash is sufficient for your use case

Conclusion

Gemini 2.0 Flash marks a generational upgrade in what a workhorse model can do. It brings native multimodal output (images and audio) and real-time streaming into a single, high-throughput package. Teams building agentic pipelines, interactive media applications, or large-scale inference workloads get a model designed from the ground up for production AI in the agentic era.

Frequently Asked Questions

What makes Gemini 2.0 Flash different from 1.5 Flash?
Gemini 2.0 Flash adds native multimodal output (images and steerable TTS audio), native tool use (Google Search, code execution, user-defined functions), and the Multimodal Live API for real-time streaming, while maintaining similar latency to 1.5 Flash and outperforming 1.5 Pro on key benchmarks.
What is the Multimodal Live API and does AI Gateway support it?
The Multimodal Live API is a streaming interface released alongside 2.0 Flash. It supports real-time audio and video input with combined tool use. Check AI Gateway documentation and your provider in vertex, google for current Live API support.
Can Gemini 2.0 Flash generate images and audio in the same response as text?
Yes. Gemini 2.0 Flash produces natively generated images and steerable text-to-speech audio alongside text in a single response, without requiring separate generation calls.
How does the context window of 1.0M tokens affect prompt construction?
With 1.0M tokens, you can pass entire codebases, long PDF documents, hours of transcripts, or extended conversation histories in a single context, eliminating the need to chunk or summarize inputs for most practical workloads.
What native tools can Gemini 2.0 Flash call?
Gemini 2.0 Flash supports Google Search, code execution, and third-party user-defined functions natively, enabling it to fetch live information, run and test code, and call external APIs within a single inference pass.
Is Gemini 2.0 Flash suitable for building multimodal assistant experiences?
Yes. Gemini 2.0 Flash pairs multimodal reasoning with native tool use, low latency, and multi-language conversational capabilities, making it suitable for assistant applications that combine voice, vision, and text inputs.
How does Zero Data Retention work with this model through AI Gateway?
Yes, Zero Data Retention is available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.
What safety measures are built into Gemini 2.0 Flash?
Gemini 2.0 Flash handles sensitive prompts more reliably than the previous generation and is evaluated against indirect prompt injection attacks. Google publishes detailed safety documentation alongside each Gemini release.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Gemini 2.0 Flash

Playground

Providers

More models by Google

About Gemini 2.0 Flash

What To Consider When Choosing a Provider

When to Use Gemini 2.0 Flash

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions