Gemma 4 31B IT
Gemma 4 31B IT is Google's open-weight dense model with 31B parameters, all active during inference. Built on the Gemini 3 architecture, it targets higher output quality than its MoE sibling, with support for function-calling, structured JSON output, native vision, and 140+ languages.
import { streamText } from 'ai'
const result = streamText({ model: 'google/gemma-4-31b-it', prompt: 'Why is the sky blue?'})What To Consider When Choosing a Provider
Zero Data Retention
AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.Authentication
AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
As a dense model with all parameters active, Gemma 4 31B IT uses more compute per token than the MoE Gemma 4 26B variant. Factor in the higher per-request cost and latency when evaluating provider variants for production traffic.
When to Use Gemma 4 31B IT
Best For
Quality-critical generation tasks:
You need the strongest output in the Gemma 4 family and can accept higher latency
Complex reasoning and analysis:
Multi-step planning, code generation, and detailed document analysis
Multilingual applications:
Serving users across 140+ languages with a single model
Vision-language tasks:
Image understanding, visual Q&A, and document parsing within a context window of 262.1K tokens
Consider Alternatives When
Latency and throughput primary:
Your primary constraints favor the MoE Gemma 4 26B, which activates fewer parameters and responds faster
Native image or audio generation:
You need media output, which Gemma 4 31B IT does not support
High-volume low-complexity inference:
A smaller or lighter model is more cost-effective
Proprietary-grade benchmark performance:
Gemini 3 Pro may be a better fit for the most demanding benchmarks
Conclusion
Gemma 4 31B IT is the quality-focused option in the Gemma 4 family. With all 31B parameters active during inference, it delivers stronger output on complex tasks. For teams that want open-weight flexibility with the highest reasoning quality the Gemma 4 generation offers, it is the right starting point on AI Gateway.
FAQ
Gemma 4 31B IT is a dense model, meaning all 31B parameters are active during every forward pass. The MoE Gemma 4 26B activates roughly 4B of its 26B total parameters per pass. Gemma 4 31B IT targets higher output quality; the 26B variant targets lower latency and cost.
Gemma 4 31B IT accepts text and image inputs within a context window of 262.1K tokens. It does not generate images or audio.
Gemma 4 31B IT is built on the same architecture as Gemini 3 but with open weights. It shares capabilities like function-calling, structured output, and system instructions. Gemini models remain proprietary; Gemma 4 31B IT lets you inspect or adapt the weights.
Over 140 languages. The instruction-tuning covers multilingual conversational and task-oriented use cases.
Set the model to google/gemma-4-31b-it in the AI SDK. AI Gateway handles provider routing, retries, and failover automatically.
Yes. It supports function-calling for agentic workflows, structured JSON output, and system instructions natively, inherited from the Gemini 3 architecture.