Skip to content
Vercel April 2026 security incident

Gemini 2.5 Flash Lite Preview 09-2025

google/gemini-2.5-flash-lite-preview-09-2025

Gemini 2.5 Flash Lite Preview 09-2025 is Google's September 2025 preview of the next Flash Lite generation, delivering better instruction following, up to 50% fewer output tokens, and improved multimodal understanding including audio transcription and image analysis.

File InputReasoningTool UseVision (Image)Web SearchImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'google/gemini-2.5-flash-lite-preview-09-2025',
prompt: 'Why is the sky blue?'
})

What To Consider When Choosing a Provider

  • Zero Data Retention

    AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.

    Authentication

    AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

This is a preview model. Behavior may change, or Google may deprecate it with two weeks' notice. Pin to the explicit model identifier in production and monitor for deprecation announcements.

When to Use Gemini 2.5 Flash Lite Preview 09-2025

Best For

  • Cost-sensitive pipelines:

    A 50% reduction in output tokens directly translates to lower spend at high volume

  • Audio transcription and summarization:

    Improved multimodal handling produces more accurate text from audio inputs

  • Image understanding tasks:

    Benefit from the enhanced visual analysis in this preview

  • Multilingual translation workloads:

    Improved translation capabilities reduce post-processing

  • System prompt-heavy applications:

    Rely on precise instruction following for structured output

Consider Alternatives When

  • Production stability required:

    Pin to the stable Gemini 2.5 Flash Lite instead of a preview release

  • Deep reasoning tasks:

    Your task requires chain-of-thought thinking, which Gemini 2.5 Flash or 2.5 Pro fits better

  • Native image or audio output:

    Flash Lite produces text output only

  • Configurable thinking budgets:

    A 2.5 Flash feature, not available in Flash Lite

Conclusion

This preview shows where Google is taking the Flash Lite tier: tighter instruction following, less output verbosity, and stronger multimodal input handling. Evaluate it against the stable Flash Lite to decide whether the improvements justify using a preview model in your pipeline.

FAQ

Three areas: instruction following for complex prompts, output verbosity (up to 50% fewer tokens), and multimodal capabilities including audio transcription, image understanding, and translation.

No. It's a preview release for developer feedback. Google provides a two-week deprecation notice before rotating preview models. Pin to the explicit model string if you need consistent behavior.

Rates are listed on this page. They reflect the providers routing through AI Gateway and shift when providers update their pricing.

No. Like the stable Flash Lite, this model accepts multimodal inputs (text, images, audio, documents) but produces text output only.

Use a Vercel API key or OIDC token with AI Gateway. Use the identifier google/gemini-2.5-flash-lite-preview-09-2025 in your requests. AI Gateway handles provider routing and failover.

Google introduced aliases like gemini-flash-lite-latest that automatically point to the newest preview. These rotate with two-week deprecation notices. Use explicit model strings for reproducibility.

Evaluate it in a staging environment first. The preview improves instruction following and reduces token usage, but behavior may change before it reaches stable. Use AI Gateway's observability to compare quality and cost side by side.