Gemini 3.1 Flash Lite Preview

Gemini 3.1 Flash Lite Preview is the efficiency-focused model in the Gemini 3.1 generation for budget-constrained, high-volume workloads, with notable gains in translation, data extraction, and code completion over Gemini 2.5 Flash Lite and four configurable thinking levels.

ReasoningTool UseImplicit CachingFile InputVision (Image)Web Search

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'google/gemini-3.1-flash-lite-preview',
  prompt: 'Why is the sky blue?'
})

Overview About Providers Throughput Latency Uptime Status Similar FAQ

About Gemini 3.1 Flash Lite Preview

Gemini 3.1 Flash Lite Preview is Google's most cost-efficient model in the 3.1 generation, designed explicitly for high-volume agentic tasks, data extraction pipelines, and latency-sensitive applications where budget is the primary constraint. This model outperforms Gemini 2.5 Flash Lite on overall quality, with the most pronounced improvements in translation, data extraction, and code completion, three task categories that commonly drive the highest request volumes in production.

The four-level thinking configuration (minimal, low, medium, high) is a notable engineering affordance. It allows a single model deployment to serve heterogeneous workloads without switching models: a bulk extraction job might run at minimal thinking to minimize latency and cost, while an edge-case translation that requires cultural nuance detection runs at medium. For teams running large-scale pipelines, content localization, automated data cleaning, code completion at IDE scale, or classification across millions of documents, Gemini 3.1 Flash Lite Preview provides the quality improvements of the 3.1 generation without the cost profile of the Pro or standard Flash tiers. Its position in the lineup is defined by throughput economics rather than maximum capability.

Agent Stack

Core Platform

Tools

Learn

Build

Explore

Gemini 3.1 Flash Lite Preview

About Gemini 3.1 Flash Lite Preview