Skip to content

Gemini 2.5 Flash Lite Preview 09-2025

Gemini 2.5 Flash Lite Preview 09-2025 is Google's September 2025 preview of the next Flash Lite generation, delivering better instruction following, up to 50% fewer output tokens, and improved multimodal understanding including audio transcription and image analysis.

File InputReasoningTool UseVision (Image)Web SearchImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'google/gemini-2.5-flash-lite-preview-09-2025',
prompt: 'Why is the sky blue?'
})

Playground

Try out Gemini 2.5 Flash Lite Preview 09-2025 by Google. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

About Gemini 2.5 Flash Lite Preview 09-2025

Gemini 2.5 Flash Lite Preview 09-2025 is a preview release from Google dated September 25, 2025. It gives you early access to the next Flash Lite tier. The preview builds on the stable Gemini 2.5 Flash Lite with three focused improvements.

First, instruction following. The preview handles complex instructions and system prompts more reliably. You'll see a smaller gap between what you ask for and what you get back. Second, verbosity. Google reported up to a 50% reduction in output tokens compared to the current stable Flash Lite. Fewer tokens means lower cost and faster responses for the same task. Third, multimodal capabilities. Audio transcription, image understanding, and translation all improved.

This preview collects developer feedback rather than replacing the stable Flash Lite. Google introduced a -latest alias system (e.g., gemini-flash-lite-latest) alongside these previews. These aliases give you automatic access to the newest version with a two-week deprecation notice. Preview models rotate. Pin to the explicit model string gemini-2.5-flash-lite-preview-09-2025 if you need consistent behavior during evaluation.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Google
Legal:Terms
Privacy
1M
$0.10/M$0.40/M
Read:$0.01/M
Write:
$35.00/K
+ input costs
09/25/2025
Google Vertex
Legal:Terms
Privacy
1M
$0.10/M$0.40/M
Read:$0.01/M
Write:
$35/K
+ input costs
09/25/2025
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by Google

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
0.9s
247tps
$0.25/M$1.50/M
Read:$0.03/M
Write:
$14.00/K
+ input costs
google logo
vertex logo
03/03/2026
1M
3.1s
202tps
$2.00/M
$12.00/M
Read:
$0.2/M
Write:
$14.00/K
+ input costs
google logo
vertex logo
02/19/2026
1M
0.7s
190tps
$0.50/M
$3.00/M
Read:
$0.05/M
Write:
$14.00/K
+ input costs
google logo
vertex logo
12/17/2025
1M
0.4s
252tps
$0.10/M$0.40/M
Read:$0.01/M
Write:
$35.00/K
+ input costs
google logo
vertex logo
06/17/2025
1M
0.4s
218tps
$0.30/M$2.50/M
Read:$0.03/M
Write:
$35.00/K
+ input costs
google logo
vertex logo
03/20/2025
1M
1.9s
134tps
$1.25/M
$10.00/M
Read:
$0.13/M
Write:
$35.00/K
+ input costs
google logo
vertex logo
03/20/2025

What To Consider When Choosing a Provider

  • Configuration: This is a preview model. Behavior may change, or Google may deprecate it with two weeks' notice. Pin to the explicit model identifier in production and monitor for deprecation announcements.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Gemini 2.5 Flash Lite Preview 09-2025

Best For

  • Cost-sensitive pipelines: A 50% reduction in output tokens directly translates to lower spend at high volume
  • Audio transcription and summarization: Improved multimodal handling produces more accurate text from audio inputs
  • Image understanding tasks: Benefit from the enhanced visual analysis in this preview
  • Multilingual translation workloads: Improved translation capabilities reduce post-processing
  • System prompt-heavy applications: Rely on precise instruction following for structured output

Consider Alternatives When

  • Production stability required: Pin to the stable Gemini 2.5 Flash Lite instead of a preview release
  • Deep reasoning tasks: Your task requires chain-of-thought thinking, which Gemini 2.5 Flash or 2.5 Pro fits better
  • Native image or audio output: Flash Lite produces text output only
  • Configurable thinking budgets: A 2.5 Flash feature, not available in Flash Lite

Conclusion

This preview shows where Google is taking the Flash Lite tier: tighter instruction following, less output verbosity, and stronger multimodal input handling. Evaluate it against the stable Flash Lite to decide whether the improvements justify using a preview model in your pipeline.

Frequently Asked Questions

  • What improved in Gemini 2.5 Flash Lite Preview 09-2025 compared to the stable Flash Lite?

    Three areas: instruction following for complex prompts, output verbosity (up to 50% fewer tokens), and multimodal capabilities including audio transcription, image understanding, and translation.

  • Is Gemini 2.5 Flash Lite Preview 09-2025 a stable release?

    No. It's a preview release for developer feedback. Google provides a two-week deprecation notice before rotating preview models. Pin to the explicit model string if you need consistent behavior.

  • How much does the reduced verbosity save on cost?

    Rates are listed on this page. They reflect the providers routing through AI Gateway and shift when providers update their pricing.

  • Does Gemini 2.5 Flash Lite Preview 09-2025 generate images or audio?

    No. Like the stable Flash Lite, this model accepts multimodal inputs (text, images, audio, documents) but produces text output only.

  • How do I authenticate requests to Gemini 2.5 Flash Lite Preview 09-2025 through AI Gateway?

    Use a Vercel API key or OIDC token with AI Gateway. Use the identifier google/gemini-2.5-flash-lite-preview-09-2025 in your requests. AI Gateway handles provider routing and failover.

  • What is the -latest alias system?

    Google introduced aliases like gemini-flash-lite-latest that automatically point to the newest preview. These rotate with two-week deprecation notices. Use explicit model strings for reproducibility.

  • Should I migrate from stable Flash Lite to this preview?

    Evaluate it in a staging environment first. The preview improves instruction following and reduces token usage, but behavior may change before it reaches stable. Use AI Gateway's observability to compare quality and cost side by side.