What thinking levels does Gemini 2.5 Flash Lite support?

Four levels: minimal, low, medium, and high. You set the level per request. Thinking tokens are added to output token count, so higher thinking levels increase both quality and cost.

How does Gemini 2.5 Flash Lite compare to 2.0 Flash-Lite in benchmark performance?

Gemini 2.5 Flash Lite shows cross-category benchmark improvements in coding, mathematics, science, and reasoning over 2.0 Flash-Lite.

Does Gemini 2.5 Flash Lite support image and audio inputs?

Yes, the model accepts multimodal inputs including images, audio, and documents alongside text, within the context window of 1.0M tokens.

What is the latency profile compared to 2.5 Flash?

Gemini 2.5 Flash Lite is the fastest and lowest-latency model in the 2.5 family. It provides 2.5-generation capability with first-token times lower than 2.5 Flash or 2.5 Pro.

When does it make sense to use thinking in Gemini 2.5 Flash Lite?

When a subset of requests in a pipeline hits problems that need more deliberation, such as math problems, multi-step instructions, or ambiguous classification. Setting thinking to minimal for routine requests and medium for flagged hard ones keeps average cost low.

How do I use Gemini 2.5 Flash Lite on AI Gateway?

Use the identifier `google/gemini-2.5-flash-lite` with any supported interface. Set thinking level via provider options in the AI SDK or via request parameters in direct API calls.

Gemini 2.5 Flash Lite

Gemini 2.5 Flash Lite is the fastest and most affordable model in the Gemini 2.5 family, with configurable thinking, a context window of 1.0M tokens, and benchmark improvements over 2.0 Flash-Lite across coding, math, and science, at a price designed for high-throughput agentic pipelines.

File InputReasoningTool UseVision (Image)Web SearchImplicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'google/gemini-2.5-flash-lite',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out Gemini 2.5 Flash Lite by Google. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

0.4s

10tps

$0.10/M

$0.40/M

Read:$0.01/M

Write:—

$35.00/K

+ input costs

—

06/17/2025

Legal:Terms

•

Privacy

0.4s

214tps

$0.10/M

$0.40/M

Read:$0.01/M

Write:—

$35.00/K

+ input costs

—

06/17/2025

More models by Google

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

2.5s

227tps

$1.50/M

$9.00/M

Read:$0.15/M

Write:—

$14.00/K

+ input costs

—

05/19/2026

1.1s

15tps

$0.25/M

$1.50/M

Read:$0.03/M

Write:—

$14.00/K

+ input costs

—

03/03/2026

5.5s

307tps

$2.00/M

$12.00/M

Read:

$0.2/M

Write:

—

$14.00/K

+ input costs

—

02/19/2026

0.9s

176tps

$0.50/M

$3.00/M

Read:

$0.05/M

Write:

—

$14.00/K

+ input costs

—

12/17/2025

0.4s

196tps

$0.30/M

$2.50/M

Read:$0.03/M

Write:—

$35.00/K

+ input costs

—

03/20/2025

1.8s

127tps

$1.25/M

$10.00/M

Read:

$0.13/M

Write:

—

$35.00/K

+ input costs

—

03/20/2025

About Gemini 2.5 Flash Lite

Gemini 2.5 Flash Lite is the efficiency tier of the Gemini 2.5 family, released June 17, 2025 alongside 2.5 Flash and 2.5 Pro going to general availability. It runs faster and costs less than any other 2.5 model while outperforming 2.0 Flash-Lite on benchmarks that matter for real-world developer tasks: coding, mathematics, scientific reasoning, and instruction following.

Configurable thinking is the feature that most distinguishes Gemini 2.5 Flash Lite from 2.0 Flash-Lite. At inference time, you set a thinking level (minimal, low, medium, or high) to allocate more deliberation to harder problems without switching endpoints. This is the same thinking mechanism available in 2.5 Flash and 2.5 Pro, scaled down to the lite budget. For tasks that occasionally need more reasoning depth than a pure speed-first model provides, the thinking toggle avoids the cost jump of routing to a full reasoning model.

For teams running 2.0 Flash-Lite in production and evaluating a 2.5 upgrade path, Gemini 2.5 Flash Lite is the migration-friendly choice: better benchmark performance, thinking capability, and latency that matches or beats the previous generation.

What To Consider When Choosing a Provider

Configuration: Applications using the thinking feature should benchmark total token cost under realistic thinking budgets, as thinking tokens contribute to output costs.
Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Gemini 2.5 Flash Lite

Best For

High-volume agentic pipelines needing occasional reasoning: The thinking toggle allows selective deliberation on harder steps without paying full 2.5 Flash prices for every call in the pipeline
Migrating from 2.0 Flash-Lite: Benchmark improvements across coding and math mean the upgrade delivers measurable quality gains on common developer tasks at comparable cost
Latency-sensitive applications within the 2.5 family: When 2.5 Flash or 2.5 Pro latency is too high for the user experience, Flash-Lite provides 2.5-generation quality at the fastest 2.5 response times
Translation, classification, and data extraction at scale: Strong instruction following and fast response make it a reliable workhorse for structured-output production tasks

Consider Alternatives When

Maximum reasoning depth is required: 2.5 Flash or 2.5 Pro with uncapped thinking budget is more appropriate for the most complex multi-step problems
Image generation is needed: Gemini 2.5 Flash Lite does not generate images. Gemini models with native image output are available in the 2.5 Flash Image and 3.x families
Your workload is pure annotation/extraction without reasoning: For text-output-only extraction at maximum cost efficiency, 2.0 Flash-Lite's lower price floor may be preferable

Conclusion

Gemini 2.5 Flash Lite closes the gap between 2.0 Flash-Lite and the full 2.5 Flash tier. It delivers better benchmark performance and thinking capability at the same latency profile teams already depend on. For 2.0 Flash-Lite users, it's the natural upgrade.

Frequently Asked Questions

What thinking levels does Gemini 2.5 Flash Lite support?
Four levels: minimal, low, medium, and high. You set the level per request. Thinking tokens are added to output token count, so higher thinking levels increase both quality and cost.
How does Gemini 2.5 Flash Lite compare to 2.0 Flash-Lite in benchmark performance?
Gemini 2.5 Flash Lite shows cross-category benchmark improvements in coding, mathematics, science, and reasoning over 2.0 Flash-Lite.
Does Gemini 2.5 Flash Lite support image and audio inputs?
Yes, the model accepts multimodal inputs including images, audio, and documents alongside text, within the context window of 1.0M tokens.
What is the latency profile compared to 2.5 Flash?
Gemini 2.5 Flash Lite is the fastest and lowest-latency model in the 2.5 family. It provides 2.5-generation capability with first-token times lower than 2.5 Flash or 2.5 Pro.
When does it make sense to use thinking in Gemini 2.5 Flash Lite?
When a subset of requests in a pipeline hits problems that need more deliberation, such as math problems, multi-step instructions, or ambiguous classification. Setting thinking to minimal for routine requests and medium for flagged hard ones keeps average cost low.
How do I use Gemini 2.5 Flash Lite on AI Gateway?
Use the identifier google/gemini-2.5-flash-lite with any supported interface. Set thinking level via provider options in the AI SDK or via request parameters in direct API calls.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Gemini 2.5 Flash Lite

Playground

Providers

More models by Google

About Gemini 2.5 Flash Lite

What To Consider When Choosing a Provider

When to Use Gemini 2.5 Flash Lite

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions