Gemini 3.1 Flash Lite
Gemini 3.1 Flash Lite is the GA release of the efficiency tier in the Gemini 3.1 generation, with improvements in reasoning, multimodal understanding, agentic tool use, and long-context performance over 2.5 Flash Lite, plus four configurable thinking levels and a context window of 1M tokens.
import { streamText } from 'ai'
const result = streamText({ model: 'google/gemini-3.1-flash-lite', prompt: 'Why is the sky blue?'})About Gemini 3.1 Flash Lite
Gemini 3.1 Flash Lite is the general-availability version of the efficiency tier in the Gemini 3.1 generation, released May 7, 2026. It outperforms Gemini 2.5 Flash Lite on overall quality and lands close to 2.5 Flash performance across key capability areas, including reasoning, multimodal understanding, agentic tool use, and long-context performance.
The model is positioned for high-volume use cases where unit economics, not peak capability, set the constraint. Gemini 3.1 Flash Lite accepts text, images, audio, and documents as input within 1M tokens and produces text output, with implicit caching and web search available as runtime options to control cost and ground responses in current information.
Four configurable thinking levels (minimal, low, medium, high) allow a single deployment to serve mixed workloads without routing across models. A bulk extraction job can run at minimal to minimize latency and tokens; an edge case in the same pipeline can step up to medium or high when more deliberation pays off. Thinking tokens contribute to output token counts, so the level becomes a direct lever on cost.
Compared to the preceding preview release, Gemini 3.1 Flash Lite is the stable surface for production deployments. Teams running 2.5 Flash Lite at scale get the 3.1 generation quality gains on the workloads that drive the most tokens, including translation, data extraction, and code completion, without moving up to the standard Flash tier.