Gemini 3.1 Flash Lite
Gemini 3.1 Flash Lite is the GA release of the efficiency tier in the Gemini 3.1 generation, with improvements in reasoning, multimodal understanding, agentic tool use, and long-context performance over 2.5 Flash Lite, plus four configurable thinking levels and a context window of 1M tokens.
import { streamText } from 'ai'
const result = streamText({ model: 'google/gemini-3.1-flash-lite', prompt: 'Why is the sky blue?'})Frequently Asked Questions
How is Gemini 3.1 Flash Lite different from
google/gemini-3.1-flash-lite-preview?Gemini 3.1 Flash Lite is the general-availability release of the same efficiency tier in the Gemini 3.1 family. The preview entry remains in the catalog for teams pinned to the earlier identifier; the GA model is the recommended target for new production work.
How does Gemini 3.1 Flash Lite compare to Gemini 2.5 Flash Lite?
Gemini 3.1 Flash Lite outperforms 2.5 Flash Lite on overall quality and lands close to 2.5 Flash across reasoning, multimodal understanding, agentic tool use, and long-context performance. For teams already running 2.5 Flash Lite at scale, it's a quality upgrade within the same lite tier.
What thinking levels does Gemini 3.1 Flash Lite support and how do they affect cost?
Four levels:
minimal,low,medium, andhigh. Higher levels add reasoning compute that contributes to output token counts, so the choice trades off latency and per-request cost against quality on harder inputs.Can I mix thinking levels across requests in the same application?
Yes. Set
thinkingLevelper request inproviderOptions.google.thinkingConfig. Routine requests can run atminimalwhile flagged hard cases run atmediumorhighwithout any architectural changes.Does Gemini 3.1 Flash Lite support multimodal inputs?
Yes. Gemini 3.1 Flash Lite accepts text, images, audio, and documents as input within the 1M tokens context window and returns text output. Web search and implicit caching are available as runtime options.
How do I call Gemini 3.1 Flash Lite on AI Gateway?
Use the identifier
google/gemini-3.1-flash-litewith the AI SDK, the OpenAI-compatible Chat Completions endpoint, the Responses API, or any other supported interface. AI Gateway handles provider routing, retries, and failover automatically.How does Zero Data Retention work with Gemini 3.1 Flash Lite through AI Gateway?
Yes, Zero Data Retention is available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.