Gemini 2.5 Flash Lite
Gemini 2.5 Flash Lite is the fastest and most affordable model in the Gemini 2.5 family, with configurable thinking, a context window of 1.0M tokens, and benchmark improvements over 2.0 Flash-Lite across coding, math, and science, at a price designed for high-throughput agentic pipelines.
import { streamText } from 'ai'
const result = streamText({ model: 'google/gemini-2.5-flash-lite', prompt: 'Why is the sky blue?'})Frequently Asked Questions
What thinking levels does Gemini 2.5 Flash Lite support?
Four levels: minimal, low, medium, and high. You set the level per request. Thinking tokens are added to output token count, so higher thinking levels increase both quality and cost.
How does Gemini 2.5 Flash Lite compare to 2.0 Flash-Lite in benchmark performance?
Gemini 2.5 Flash Lite shows cross-category benchmark improvements in coding, mathematics, science, and reasoning over 2.0 Flash-Lite.
Does Gemini 2.5 Flash Lite support image and audio inputs?
Yes, the model accepts multimodal inputs including images, audio, and documents alongside text, within the context window of 1.0M tokens.
What is the latency profile compared to 2.5 Flash?
Gemini 2.5 Flash Lite is the fastest and lowest-latency model in the 2.5 family. It provides 2.5-generation capability with first-token times lower than 2.5 Flash or 2.5 Pro.
When does it make sense to use thinking in Gemini 2.5 Flash Lite?
When a subset of requests in a pipeline hits problems that need more deliberation, such as math problems, multi-step instructions, or ambiguous classification. Setting thinking to minimal for routine requests and medium for flagged hard ones keeps average cost low.
How do I use Gemini 2.5 Flash Lite on AI Gateway?
Use the identifier
google/gemini-2.5-flash-litewith any supported interface. Set thinking level via provider options in the AI SDK or via request parameters in direct API calls.