Skip to content

Gemini 2.5 Flash Lite

Gemini 2.5 Flash Lite is the fastest and most affordable model in the Gemini 2.5 family, with configurable thinking, a context window of 1.0M tokens, and benchmark improvements over 2.0 Flash-Lite across coding, math, and science, at a price designed for high-throughput agentic pipelines.

File InputReasoningTool UseVision (Image)Web SearchImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'google/gemini-2.5-flash-lite',
prompt: 'Why is the sky blue?'
})

Frequently Asked Questions

  • What thinking levels does Gemini 2.5 Flash Lite support?

    Four levels: minimal, low, medium, and high. You set the level per request. Thinking tokens are added to output token count, so higher thinking levels increase both quality and cost.

  • How does Gemini 2.5 Flash Lite compare to 2.0 Flash-Lite in benchmark performance?

    Gemini 2.5 Flash Lite shows cross-category benchmark improvements in coding, mathematics, science, and reasoning over 2.0 Flash-Lite.

  • Does Gemini 2.5 Flash Lite support image and audio inputs?

    Yes, the model accepts multimodal inputs including images, audio, and documents alongside text, within the context window of 1.0M tokens.

  • What is the latency profile compared to 2.5 Flash?

    Gemini 2.5 Flash Lite is the fastest and lowest-latency model in the 2.5 family. It provides 2.5-generation capability with first-token times lower than 2.5 Flash or 2.5 Pro.

  • When does it make sense to use thinking in Gemini 2.5 Flash Lite?

    When a subset of requests in a pipeline hits problems that need more deliberation, such as math problems, multi-step instructions, or ambiguous classification. Setting thinking to minimal for routine requests and medium for flagged hard ones keeps average cost low.

  • How do I use Gemini 2.5 Flash Lite on AI Gateway?

    Use the identifier google/gemini-2.5-flash-lite with any supported interface. Set thinking level via provider options in the AI SDK or via request parameters in direct API calls.