Gemini 2.5 Flash Lite Preview 09-2025
Gemini 2.5 Flash Lite Preview 09-2025 is Google's September 2025 preview of the next Flash Lite generation, delivering better instruction following, up to 50% fewer output tokens, and improved multimodal understanding including audio transcription and image analysis.
import { streamText } from 'ai'
const result = streamText({ model: 'google/gemini-2.5-flash-lite-preview-09-2025', prompt: 'Why is the sky blue?'})What To Consider When Choosing a Provider
Zero Data Retention
AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.Authentication
AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
This is a preview model. Behavior may change, or Google may deprecate it with two weeks' notice. Pin to the explicit model identifier in production and monitor for deprecation announcements.
When to Use Gemini 2.5 Flash Lite Preview 09-2025
Best For
Cost-sensitive pipelines:
A 50% reduction in output tokens directly translates to lower spend at high volume
Audio transcription and summarization:
Improved multimodal handling produces more accurate text from audio inputs
Image understanding tasks:
Benefit from the enhanced visual analysis in this preview
Multilingual translation workloads:
Improved translation capabilities reduce post-processing
System prompt-heavy applications:
Rely on precise instruction following for structured output
Consider Alternatives When
Production stability required:
Pin to the stable Gemini 2.5 Flash Lite instead of a preview release
Deep reasoning tasks:
Your task requires chain-of-thought thinking, which Gemini 2.5 Flash or 2.5 Pro fits better
Native image or audio output:
Flash Lite produces text output only
Configurable thinking budgets:
A 2.5 Flash feature, not available in Flash Lite
Conclusion
This preview shows where Google is taking the Flash Lite tier: tighter instruction following, less output verbosity, and stronger multimodal input handling. Evaluate it against the stable Flash Lite to decide whether the improvements justify using a preview model in your pipeline.
FAQ
Three areas: instruction following for complex prompts, output verbosity (up to 50% fewer tokens), and multimodal capabilities including audio transcription, image understanding, and translation.
No. It's a preview release for developer feedback. Google provides a two-week deprecation notice before rotating preview models. Pin to the explicit model string if you need consistent behavior.
Rates are listed on this page. They reflect the providers routing through AI Gateway and shift when providers update their pricing.
No. Like the stable Flash Lite, this model accepts multimodal inputs (text, images, audio, documents) but produces text output only.
Use a Vercel API key or OIDC token with AI Gateway. Use the identifier google/gemini-2.5-flash-lite-preview-09-2025 in your requests. AI Gateway handles provider routing and failover.
Google introduced aliases like gemini-flash-lite-latest that automatically point to the newest preview. These rotate with two-week deprecation notices. Use explicit model strings for reproducibility.
Evaluate it in a staging environment first. The preview improves instruction following and reduces token usage, but behavior may change before it reaches stable. Use AI Gateway's observability to compare quality and cost side by side.