What thinking levels does Gemini 2.5 Flash Lite support?

Four levels: minimal, low, medium, and high. You set the level per request. Thinking tokens are added to output token count, so higher thinking levels increase both quality and cost.

How does Gemini 2.5 Flash Lite compare to 2.0 Flash-Lite in benchmark performance?

Gemini 2.5 Flash Lite shows cross-category benchmark improvements in coding, mathematics, science, and reasoning over 2.0 Flash-Lite.

Does Gemini 2.5 Flash Lite support image and audio inputs?

Yes, the model accepts multimodal inputs including images, audio, and documents alongside text, within the context window of 1.0M tokens.

What is the latency profile compared to 2.5 Flash?

Gemini 2.5 Flash Lite is the fastest and lowest-latency model in the 2.5 family. It provides 2.5-generation capability with first-token times lower than 2.5 Flash or 2.5 Pro.

When does it make sense to use thinking in Gemini 2.5 Flash Lite?

When a subset of requests in a pipeline hits problems that need more deliberation, such as math problems, multi-step instructions, or ambiguous classification. Setting thinking to minimal for routine requests and medium for flagged hard ones keeps average cost low.

How do I use Gemini 2.5 Flash Lite on AI Gateway?

Use the identifier `google/gemini-2.5-flash-lite` with any supported interface. Set thinking level via provider options in the AI SDK or via request parameters in direct API calls.

Gemini 2.5 Flash Lite

Gemini 2.5 Flash Lite is the fastest and most affordable model in the Gemini 2.5 family, with configurable thinking, a context window of 1.0M tokens, and benchmark improvements over 2.0 Flash-Lite across coding, math, and science, at a price designed for high-throughput agentic pipelines.

File InputReasoningTool UseVision (Image)Web SearchImplicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'google/gemini-2.5-flash-lite',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Frequently Asked Questions

What thinking levels does Gemini 2.5 Flash Lite support?
Four levels: minimal, low, medium, and high. You set the level per request. Thinking tokens are added to output token count, so higher thinking levels increase both quality and cost.
How does Gemini 2.5 Flash Lite compare to 2.0 Flash-Lite in benchmark performance?
Gemini 2.5 Flash Lite shows cross-category benchmark improvements in coding, mathematics, science, and reasoning over 2.0 Flash-Lite.
Does Gemini 2.5 Flash Lite support image and audio inputs?
Yes, the model accepts multimodal inputs including images, audio, and documents alongside text, within the context window of 1.0M tokens.
What is the latency profile compared to 2.5 Flash?
Gemini 2.5 Flash Lite is the fastest and lowest-latency model in the 2.5 family. It provides 2.5-generation capability with first-token times lower than 2.5 Flash or 2.5 Pro.
When does it make sense to use thinking in Gemini 2.5 Flash Lite?
When a subset of requests in a pipeline hits problems that need more deliberation, such as math problems, multi-step instructions, or ambiguous classification. Setting thinking to minimal for routine requests and medium for flagged hard ones keeps average cost low.
How do I use Gemini 2.5 Flash Lite on AI Gateway?
Use the identifier google/gemini-2.5-flash-lite with any supported interface. Set thinking level via provider options in the AI SDK or via request parameters in direct API calls.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Gemini 2.5 Flash Lite

Frequently Asked Questions