What makes Gemma 4 31B IT different from the MoE Gemma 4 26B?

Gemma 4 31B IT is a dense model, meaning all 31B parameters are active during every forward pass. The MoE Gemma 4 26B activates roughly 4B of its 26B total parameters per pass. Gemma 4 31B IT targets higher output quality; the 26B variant targets lower latency and cost.

How does Gemma 4 31B IT relate to Google's Gemini models?

Gemma 4 31B IT is built on the same architecture as Gemini 3 but with open weights. It shares capabilities like function-calling, structured output, and system instructions. Gemini models remain proprietary; Gemma 4 31B IT lets you inspect or adapt the weights.

What languages does Gemma 4 31B IT support?

Over 140 languages. The instruction-tuning covers multilingual conversational and task-oriented use cases.

How do I use Gemma 4 31B IT on AI Gateway?

Set the model to `google/gemma-4-31b-it` in the AI SDK. AI Gateway handles provider routing, retries, and failover automatically.

Dashboard

Gemma 4 31B IT

Gemma 4 31B IT is Google's open-weight dense model with 31B parameters, all active during inference. Built on the Gemini 3 architecture, it targets higher output quality than its MoE sibling, with support for function-calling, structured JSON output, native vision, and 140+ languages.

Tool UseVision (Image)File Input

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'google/gemma-4-31b-it',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out Gemma 4 31B IT by Google. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

262K

0.8s

28tps

$0.14/M

$0.40/M

—

04/02/2026

Legal:Terms

•

Privacy

262K

0.5s

82tps

$0.14/M

$0.40/M

—

04/02/2026

More models by Google

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

2.7s

287tps

$1.50/M

$9.00/M

Read:$0.15/M

Write:—

$14.00/K

+ input costs

—

05/19/2026

0.7s

251tps

$0.25/M

$1.50/M

Read:$0.03/M

Write:—

$14.00/K

+ input costs

—

03/03/2026

4.9s

376tps

$2.00/M

$12.00/M

Read:

$0.2/M

Write:

—

$14.00/K

+ input costs

—

02/19/2026

0.6s

179tps

$0.50/M

$3.00/M

Read:

$0.05/M

Write:

—

$14.00/K

+ input costs

—

12/17/2025

0.3s

272tps

$0.10/M

$0.40/M

Read:$0.01/M

Write:—

$35.00/K

+ input costs

—

06/17/2025

0.4s

199tps

$0.30/M

$2.50/M

Read:$0.03/M

Write:—

$35.00/K

+ input costs

—

03/20/2025

About Gemma 4 31B IT

Gemma 4 31B IT is the dense counterpart in Google's Gemma 4 family, released on April 2, 2026 alongside the mixture-of-experts Gemma 4 26B. While both share the Gemini 3 architecture, this model activates all 31B parameters during every forward pass.

The dense design means every parameter contributes to every prediction. This produces higher output quality on complex reasoning, generation, and analysis tasks compared to the MoE variant, where a routing mechanism selects a subset of parameters. The tradeoff is higher compute per token, which translates to increased latency and cost per request.

Gemma 4 31B IT accepts text and image inputs within a context window of 262.1K tokens, supports over 140 languages, and handles function-calling, agentic workflows, structured JSON output, and system instructions. The instruction-tuning (indicated by the it suffix) prepares the model for conversational and task-oriented use out of the box.

Running Gemma 4 31B IT through AI Gateway provides unified billing, observability, automatic retries, and provider failover across a single API surface.

What To Consider When Choosing a Provider

Configuration: As a dense model with all parameters active, Gemma 4 31B IT uses more compute per token than the MoE Gemma 4 26B variant. Factor in the higher per-request cost and latency when evaluating provider variants for production traffic.
Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Gemma 4 31B IT

Best For

Quality-critical generation tasks: You need the strongest output in the Gemma 4 family and can accept higher latency
Complex reasoning and analysis: Multi-step planning, code generation, and detailed document analysis
Multilingual applications: Serving users across 140+ languages with a single model
Vision-language tasks: Image understanding, visual Q&A, and document parsing within a context window of 262.1K tokens

Consider Alternatives When

Latency and throughput primary: Your primary constraints favor the MoE Gemma 4 26B, which activates fewer parameters and responds faster
Native image or audio generation: You need media output, which Gemma 4 31B IT does not support
High-volume low-complexity inference: A smaller or lighter model is more cost-effective
Proprietary-grade benchmark performance: Gemini 3 Pro may be a better fit for the most demanding benchmarks

Conclusion

Gemma 4 31B IT is the quality-focused option in the Gemma 4 family. With all 31B parameters active during inference, it delivers stronger output on complex tasks. For teams that want open-weight flexibility with the highest reasoning quality the Gemma 4 generation offers, it is the right starting point on AI Gateway.

Frequently Asked Questions

What makes Gemma 4 31B IT different from the MoE Gemma 4 26B?
Gemma 4 31B IT is a dense model, meaning all 31B parameters are active during every forward pass. The MoE Gemma 4 26B activates roughly 4B of its 26B total parameters per pass. Gemma 4 31B IT targets higher output quality; the 26B variant targets lower latency and cost.
What input modalities does Gemma 4 31B IT support?
Gemma 4 31B IT accepts text and image inputs within a context window of 262.1K tokens. It does not generate images or audio.
How does Gemma 4 31B IT relate to Google's Gemini models?
Gemma 4 31B IT is built on the same architecture as Gemini 3 but with open weights. It shares capabilities like function-calling, structured output, and system instructions. Gemini models remain proprietary; Gemma 4 31B IT lets you inspect or adapt the weights.
What languages does Gemma 4 31B IT support?
Over 140 languages. The instruction-tuning covers multilingual conversational and task-oriented use cases.
How do I use Gemma 4 31B IT on AI Gateway?
Set the model to google/gemma-4-31b-it in the AI SDK. AI Gateway handles provider routing, retries, and failover automatically.
Does Gemma 4 31B IT support function-calling?
Yes. It supports function-calling for agentic workflows, structured JSON output, and system instructions natively, inherited from the Gemini 3 architecture.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Gemma 4 31B IT

Playground

Providers

More models by Google

About Gemma 4 31B IT

What To Consider When Choosing a Provider

When to Use Gemma 4 31B IT

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions