Gemini 2.5 Flash Preview 09-2025
Gemini 2.5 Flash Preview 09-2025 is Google's September 2025 preview of the next Gemini 2.5 Flash, scoring 54% on SWE-Bench Verified (up from 48.9%), improving agentic tool use, and producing 24% fewer output tokens with thinking enabled.
import { streamText } from 'ai'
const result = streamText({ model: 'google/gemini-2.5-flash-preview-09-2025', prompt: 'Why is the sky blue?'})Playground
Try out Gemini 2.5 Flash Preview 09-2025 by Google. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by Google
| Model |
|---|
About Gemini 2.5 Flash Preview 09-2025
Gemini 2.5 Flash Preview 09-2025 is a preview release from Google dated September 25, 2025. It advances the hybrid reasoning model that defined the 2.5 Flash tier. Two improvements stand out.
First, agentic tool use. Google called out better performance on complex, multi-step applications. The SWE-Bench Verified score moved from 48.9% to 54%, a five-point gain on real-world software engineering tasks involving bug fixes and feature implementations across production codebases.
Second, efficiency with thinking enabled. The preview produces 24% fewer output tokens compared to the stable model when thinking mode is active. Fewer thinking tokens means lower cost and faster responses on reasoning-intensive prompts while maintaining quality.
Like the Flash Lite preview released alongside it, this model uses Google's -latest alias system with two-week deprecation notices. Pin to gemini-2.5-flash-preview-09-2025 for stable evaluation. Google intends the preview for feedback collection, not as a direct replacement for the stable 2.5 Flash.
What To Consider When Choosing a Provider
- Configuration: This is a preview model. The 24% reduction in thinking tokens changes cost profiles for reasoning-heavy workloads. Benchmark against the stable 2.5 Flash using AI Gateway's observability tools before committing production traffic.
- Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Gemini 2.5 Flash Preview 09-2025
Best For
- Agentic coding pipelines: The 54% SWE-Bench Verified score represents a concrete improvement over the stable 2.5 Flash
- Multi-step tool use applications: Benefit from improved agentic capabilities across complex workflows
- Reasoning-intensive workloads with cost constraints: 24% fewer thinking tokens materially reduces spend
- Hybrid reasoning applications: Toggle thinking on and off per request and need better cost efficiency when thinking is active
- Software engineering automation: Bug triage, code review, and feature implementation
Consider Alternatives When
- Production stability required: Use the stable Gemini 2.5 Flash for production workloads
- Simple classification or extraction: Thinking adds cost without benefit, and Gemini 2.5 Flash Lite is cheaper
- Deepest reasoning needed: Gemini 2.5 Pro targets the hardest problems with no cost constraint
Conclusion
Gemini 2.5 Flash Preview 09-2025 delivers two concrete gains over the stable 2.5 Flash: stronger agentic tool use (54% SWE-Bench Verified) and cheaper thinking (24% fewer output tokens). For teams already using 2.5 Flash on reasoning or coding tasks, this preview is worth benchmarking in staging.
Frequently Asked Questions
What is the SWE-Bench Verified improvement in Gemini 2.5 Flash Preview 09-2025?
The score moved from 48.9% on the stable model to 54% on this preview. SWE-Bench Verified tests a model's ability to resolve real software engineering tasks from production repositories.
How much do thinking tokens cost with Gemini 2.5 Flash Preview 09-2025?
Google reported a 24% reduction in output tokens compared to the stable model when thinking is active. The per-token rate stays the same, but you consume fewer tokens per reasoning task.
Is this preview a replacement for the stable Gemini 2.5 Flash?
No. Google released it for developer feedback. It may reach stable later or be superseded. Pin to the explicit model string and monitor deprecation notices.
Does Gemini 2.5 Flash Preview 09-2025 support thinking budgets like the stable 2.5 Flash?
Yes. The hybrid reasoning design carries forward. You can disable thinking entirely or set a budget that controls how much deliberation the model applies per request.
How do I authenticate requests to Gemini 2.5 Flash Preview 09-2025 through AI Gateway?
Use a Vercel API key or OIDC token with AI Gateway. Use the identifier
google/gemini-2.5-flash-preview-09-2025in your API calls. AI Gateway manages routing, retries, and failover across google, vertex.Can I use the
-latestalias instead of the dated preview string?Google offers aliases like
gemini-flash-latestthat auto-update to the newest preview. These rotate with two-week deprecation notices. Use the explicitgemini-2.5-flash-preview-09-2025string for reproducible behavior.How does Gemini 2.5 Flash Preview 09-2025 compare to Gemini 2.5 Pro?
2.5 Flash (including this preview) sits on the cost-performance frontier. It delivers strong reasoning at lower cost than 2.5 Pro. For the hardest problems where benchmark scores justify the premium, 2.5 Pro remains the Pro tier in the 2.5 family.