Gemma 4 31B IT
Gemma 4 31B IT is Google's open-weight dense model with 31B parameters, all active during inference. Built on the Gemini 3 architecture, it targets higher output quality than its MoE sibling, with support for function-calling, structured JSON output, native vision, and 140+ languages.
import { streamText } from 'ai'
const result = streamText({ model: 'google/gemma-4-31b-it', prompt: 'Why is the sky blue?'})Playground
Try out Gemma 4 31B IT by Google. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
Ask Gemma 4 31B IT anything to try it out.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by Google
| Model |
|---|
About Gemma 4 31B IT
Gemma 4 31B IT is the dense counterpart in Google's Gemma 4 family, released on April 2, 2026 alongside the mixture-of-experts Gemma 4 26B. While both share the Gemini 3 architecture, this model activates all 31B parameters during every forward pass.
The dense design means every parameter contributes to every prediction. This produces higher output quality on complex reasoning, generation, and analysis tasks compared to the MoE variant, where a routing mechanism selects a subset of parameters. The tradeoff is higher compute per token, which translates to increased latency and cost per request.
Gemma 4 31B IT accepts text and image inputs within a context window of 262.1K tokens, supports over 140 languages, and handles function-calling, agentic workflows, structured JSON output, and system instructions. The instruction-tuning (indicated by the it suffix) prepares the model for conversational and task-oriented use out of the box.
Running Gemma 4 31B IT through AI Gateway provides unified billing, observability, automatic retries, and provider failover across a single API surface.
What To Consider When Choosing a Provider
- Configuration: As a dense model with all parameters active, Gemma 4 31B IT uses more compute per token than the MoE Gemma 4 26B variant. Factor in the higher per-request cost and latency when evaluating provider variants for production traffic.
- Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Gemma 4 31B IT
Best For
- Quality-critical generation tasks: You need the strongest output in the Gemma 4 family and can accept higher latency
- Complex reasoning and analysis: Multi-step planning, code generation, and detailed document analysis
- Multilingual applications: Serving users across 140+ languages with a single model
- Vision-language tasks: Image understanding, visual Q&A, and document parsing within a context window of 262.1K tokens
Consider Alternatives When
- Latency and throughput primary: Your primary constraints favor the MoE Gemma 4 26B, which activates fewer parameters and responds faster
- Native image or audio generation: You need media output, which Gemma 4 31B IT does not support
- High-volume low-complexity inference: A smaller or lighter model is more cost-effective
- Proprietary-grade benchmark performance: Gemini 3 Pro may be a better fit for the most demanding benchmarks
Conclusion
Gemma 4 31B IT is the quality-focused option in the Gemma 4 family. With all 31B parameters active during inference, it delivers stronger output on complex tasks. For teams that want open-weight flexibility with the highest reasoning quality the Gemma 4 generation offers, it is the right starting point on AI Gateway.