Skip to content

Gemma 4 31B IT

Gemma 4 31B IT is Google's open-weight dense model with 31B parameters, all active during inference. Built on the Gemini 3 architecture, it targets higher output quality than its MoE sibling, with support for function-calling, structured JSON output, native vision, and 140+ languages.

Tool UseVision (Image)File Input
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'google/gemma-4-31b-it',
prompt: 'Why is the sky blue?'
})

Playground

Try out Gemma 4 31B IT by Google. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

About Gemma 4 31B IT

Gemma 4 31B IT is the dense counterpart in Google's Gemma 4 family, released on April 2, 2026 alongside the mixture-of-experts Gemma 4 26B. While both share the Gemini 3 architecture, this model activates all 31B parameters during every forward pass.

The dense design means every parameter contributes to every prediction. This produces higher output quality on complex reasoning, generation, and analysis tasks compared to the MoE variant, where a routing mechanism selects a subset of parameters. The tradeoff is higher compute per token, which translates to increased latency and cost per request.

Gemma 4 31B IT accepts text and image inputs within a context window of 262.1K tokens, supports over 140 languages, and handles function-calling, agentic workflows, structured JSON output, and system instructions. The instruction-tuning (indicated by the it suffix) prepares the model for conversational and task-oriented use out of the box.

Running Gemma 4 31B IT through AI Gateway provides unified billing, observability, automatic retries, and provider failover across a single API surface.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Novita AI
Legal:Terms
Privacy
262K
0.8s
7tps
$0.14/M$0.40/M
04/02/2026
Parasail
Legal:Terms
Privacy
262K
0.6s
6tps
$0.14/M$0.40/M
04/02/2026
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by Google

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
0.9s
238tps
$0.25/M$1.50/M
Read:$0.03/M
Write:
$14.00/K
+ input costs
google logo
vertex logo
03/03/2026
1M
3.5s
139tps
$2.00/M
$12.00/M
Read:
$0.2/M
Write:
$14.00/K
+ input costs
google logo
vertex logo
02/19/2026
1M
0.7s
179tps
$0.50/M
$3.00/M
Read:
$0.05/M
Write:
$14.00/K
+ input costs
google logo
vertex logo
12/17/2025
1M
0.3s
228tps
$0.10/M$0.40/M
Read:$0.01/M
Write:
$35.00/K
+ input costs
google logo
vertex logo
06/17/2025
1M
0.5s
199tps
$0.30/M$2.50/M
Read:$0.03/M
Write:
$35.00/K
+ input costs
google logo
vertex logo
03/20/2025
1M
1.9s
127tps
$1.25/M
$10.00/M
Read:
$0.13/M
Write:
$35.00/K
+ input costs
google logo
vertex logo
03/20/2025

What To Consider When Choosing a Provider

  • Configuration: As a dense model with all parameters active, Gemma 4 31B IT uses more compute per token than the MoE Gemma 4 26B variant. Factor in the higher per-request cost and latency when evaluating provider variants for production traffic.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Gemma 4 31B IT

Best For

  • Quality-critical generation tasks: You need the strongest output in the Gemma 4 family and can accept higher latency
  • Complex reasoning and analysis: Multi-step planning, code generation, and detailed document analysis
  • Multilingual applications: Serving users across 140+ languages with a single model
  • Vision-language tasks: Image understanding, visual Q&A, and document parsing within a context window of 262.1K tokens

Consider Alternatives When

  • Latency and throughput primary: Your primary constraints favor the MoE Gemma 4 26B, which activates fewer parameters and responds faster
  • Native image or audio generation: You need media output, which Gemma 4 31B IT does not support
  • High-volume low-complexity inference: A smaller or lighter model is more cost-effective
  • Proprietary-grade benchmark performance: Gemini 3 Pro may be a better fit for the most demanding benchmarks

Conclusion

Gemma 4 31B IT is the quality-focused option in the Gemma 4 family. With all 31B parameters active during inference, it delivers stronger output on complex tasks. For teams that want open-weight flexibility with the highest reasoning quality the Gemma 4 generation offers, it is the right starting point on AI Gateway.

Frequently Asked Questions

  • What makes Gemma 4 31B IT different from the MoE Gemma 4 26B?

    Gemma 4 31B IT is a dense model, meaning all 31B parameters are active during every forward pass. The MoE Gemma 4 26B activates roughly 4B of its 26B total parameters per pass. Gemma 4 31B IT targets higher output quality; the 26B variant targets lower latency and cost.

  • What input modalities does Gemma 4 31B IT support?

    Gemma 4 31B IT accepts text and image inputs within a context window of 262.1K tokens. It does not generate images or audio.

  • How does Gemma 4 31B IT relate to Google's Gemini models?

    Gemma 4 31B IT is built on the same architecture as Gemini 3 but with open weights. It shares capabilities like function-calling, structured output, and system instructions. Gemini models remain proprietary; Gemma 4 31B IT lets you inspect or adapt the weights.

  • What languages does Gemma 4 31B IT support?

    Over 140 languages. The instruction-tuning covers multilingual conversational and task-oriented use cases.

  • How do I use Gemma 4 31B IT on AI Gateway?

    Set the model to google/gemma-4-31b-it in the AI SDK. AI Gateway handles provider routing, retries, and failover automatically.

  • Does Gemma 4 31B IT support function-calling?

    Yes. It supports function-calling for agentic workflows, structured JSON output, and system instructions natively, inherited from the Gemini 3 architecture.