Skip to content

Gemini 3.1 Flash Lite

Gemini 3.1 Flash Lite is the GA release of the efficiency tier in the Gemini 3.1 generation, with improvements in reasoning, multimodal understanding, agentic tool use, and long-context performance over 2.5 Flash Lite, plus four configurable thinking levels and a context window of 1M tokens.

ReasoningTool UseImplicit CachingFile InputVision (Image)Web Search
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'google/gemini-3.1-flash-lite',
prompt: 'Why is the sky blue?'
})

Frequently Asked Questions

  • How is Gemini 3.1 Flash Lite different from google/gemini-3.1-flash-lite-preview?

    Gemini 3.1 Flash Lite is the general-availability release of the same efficiency tier in the Gemini 3.1 family. The preview entry remains in the catalog for teams pinned to the earlier identifier; the GA model is the recommended target for new production work.

  • How does Gemini 3.1 Flash Lite compare to Gemini 2.5 Flash Lite?

    Gemini 3.1 Flash Lite outperforms 2.5 Flash Lite on overall quality and lands close to 2.5 Flash across reasoning, multimodal understanding, agentic tool use, and long-context performance. For teams already running 2.5 Flash Lite at scale, it's a quality upgrade within the same lite tier.

  • What thinking levels does Gemini 3.1 Flash Lite support and how do they affect cost?

    Four levels: minimal, low, medium, and high. Higher levels add reasoning compute that contributes to output token counts, so the choice trades off latency and per-request cost against quality on harder inputs.

  • Can I mix thinking levels across requests in the same application?

    Yes. Set thinkingLevel per request in providerOptions.google.thinkingConfig. Routine requests can run at minimal while flagged hard cases run at medium or high without any architectural changes.

  • Does Gemini 3.1 Flash Lite support multimodal inputs?

    Yes. Gemini 3.1 Flash Lite accepts text, images, audio, and documents as input within the 1M tokens context window and returns text output. Web search and implicit caching are available as runtime options.

  • How do I call Gemini 3.1 Flash Lite on AI Gateway?

    Use the identifier google/gemini-3.1-flash-lite with the AI SDK, the OpenAI-compatible Chat Completions endpoint, the Responses API, or any other supported interface. AI Gateway handles provider routing, retries, and failover automatically.

  • How does Zero Data Retention work with Gemini 3.1 Flash Lite through AI Gateway?

    Yes, Zero Data Retention is available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.