Gemini 3.1 Flash Lite Preview
Gemini 3.1 Flash Lite Preview is the efficiency-focused model in the Gemini 3.1 generation for budget-constrained, high-volume workloads, with notable gains in translation, data extraction, and code completion over Gemini 2.5 Flash Lite and four configurable thinking levels.
import { streamText } from 'ai'
const result = streamText({ model: 'google/gemini-3.1-flash-lite-preview', prompt: 'Why is the sky blue?'})Frequently Asked Questions
What are the four thinking levels and what do they mean for cost and quality?
minimal,low,medium, andhigh. Lower levels reduce the amount of reasoning compute applied before generating a response, which decreases latency and token consumption but may reduce quality on complex tasks.highapplies the most reasoning, similar to configuring a reasoning model for thorough inference.Which task categories saw the most improvement over Gemini 2.5 Flash Lite?
Translation, data extraction, and code completion saw the largest improvements over Gemini 2.5 Flash Lite. These are the high-volume task categories where the efficiency gains of the 3.1 generation have the most practical impact.
Is this model suitable for agentic multi-agent architectures?
Yes. High-volume agentic tasks are a primary target. The model's low cost and configurable thinking levels make it appropriate for sub-agents in hierarchical agent systems.
Can I mix thinking levels in the same application?
Yes. You set
thinkingLevelper request inproviderOptions.google.thinkingConfig, so different request types within the same application can use different levels without any architectural changes.Does Gemini 3.1 Flash Lite Preview support streaming?
Yes. Use
streamTextfrom the AI SDK withmodel: 'google/gemini-3.1-flash-lite-preview'to stream responses.How does Gemini 3.1 Flash Lite Preview differ from Gemini 3 Flash?
Gemini 3 Flash prioritizes pro-grade reasoning at flash speed and is positioned as the standard speed/quality balance point. Flash Lite is specifically optimized for maximum cost efficiency and high volume throughput, trading some capability headroom for a lower price point.
Is
includeThoughtssupported on this model?Yes. Set
includeThoughts: trueinproviderOptions.google.thinkingConfigto stream the model's reasoning tokens alongside the generated response.What's the recommended thinking level for bulk translation tasks?
Start with
loworminimalfor straightforward translation tasks where throughput is the primary concern. Increase tomediumfor content requiring cultural nuance or domain-specific accuracy.