Gemini 2.5 Flash Preview 09-2025
Gemini 2.5 Flash Preview 09-2025 is Google's September 2025 preview of the next Gemini 2.5 Flash, scoring 54% on SWE-Bench Verified (up from 48.9%), improving agentic tool use, and producing 24% fewer output tokens with thinking enabled.
import { streamText } from 'ai'
const result = streamText({ model: 'google/gemini-2.5-flash-preview-09-2025', prompt: 'Why is the sky blue?'})What To Consider When Choosing a Provider
Zero Data Retention
AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.Authentication
AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
This is a preview model. The 24% reduction in thinking tokens changes cost profiles for reasoning-heavy workloads. Benchmark against the stable 2.5 Flash using AI Gateway's observability tools before committing production traffic.
When to Use Gemini 2.5 Flash Preview 09-2025
Best For
Agentic coding pipelines:
The 54% SWE-Bench Verified score represents a concrete improvement over the stable 2.5 Flash
Multi-step tool use applications:
Benefit from improved agentic capabilities across complex workflows
Reasoning-intensive workloads with cost constraints:
24% fewer thinking tokens materially reduces spend
Hybrid reasoning applications:
Toggle thinking on and off per request and need better cost efficiency when thinking is active
Software engineering automation:
Bug triage, code review, and feature implementation
Consider Alternatives When
Production stability required:
Use the stable Gemini 2.5 Flash for production workloads
Simple classification or extraction:
Thinking adds cost without benefit, and Gemini 2.5 Flash Lite is cheaper
Deepest reasoning needed:
Gemini 2.5 Pro targets the hardest problems with no cost constraint
Conclusion
Gemini 2.5 Flash Preview 09-2025 delivers two concrete gains over the stable 2.5 Flash: stronger agentic tool use (54% SWE-Bench Verified) and cheaper thinking (24% fewer output tokens). For teams already using 2.5 Flash on reasoning or coding tasks, this preview is worth benchmarking in staging.
FAQ
The score moved from 48.9% on the stable model to 54% on this preview. SWE-Bench Verified tests a model's ability to resolve real software engineering tasks from production repositories.
Google reported a 24% reduction in output tokens compared to the stable model when thinking is active. The per-token rate stays the same, but you consume fewer tokens per reasoning task.
No. Google released it for developer feedback. It may reach stable later or be superseded. Pin to the explicit model string and monitor deprecation notices.
Yes. The hybrid reasoning design carries forward. You can disable thinking entirely or set a budget that controls how much deliberation the model applies per request.
Use a Vercel API key or OIDC token with AI Gateway. Use the identifier google/gemini-2.5-flash-preview-09-2025 in your API calls. AI Gateway manages routing, retries, and failover across google, vertex.
Google offers aliases like gemini-flash-latest that auto-update to the newest preview. These rotate with two-week deprecation notices. Use the explicit gemini-2.5-flash-preview-09-2025 string for reproducible behavior.
2.5 Flash (including this preview) sits on the cost-performance frontier. It delivers strong reasoning at lower cost than 2.5 Pro. For the hardest problems where benchmark scores justify the premium, 2.5 Pro remains the Pro tier in the 2.5 family.