What does mixture-of-experts mean for Gemma 4 26B A4B IT?

Gemma 4 26B A4B IT has 26B total parameters split across expert sub-networks. A routing mechanism activates roughly 4B parameters per forward pass, selecting the most relevant experts for each input. This reduces compute per token compared to a dense model of equivalent total size.

How does Gemma 4 26B A4B IT compare to the dense Gemma 4 31B?

Gemma 4 26B A4B IT prioritizes latency and throughput by activating fewer parameters per token. The dense Gemma 4 31B activates all 31B parameters, targeting higher output quality at the cost of more compute. Choose Gemma 4 26B A4B IT when speed matters and the dense variant when quality is the priority.

What input modalities does Gemma 4 26B A4B IT support?

Gemma 4 26B A4B IT accepts text and image inputs. It does not generate images or audio. Use it for text generation, visual understanding, and structured output tasks.

What languages does Gemma 4 26B A4B IT support?

Over 140 languages. The instruction-tuning covers multilingual conversational and task-oriented use cases.

How do I use Gemma 4 26B A4B IT on AI Gateway?

Set the model to `google/gemma-4-26b-a4b-it` in the AI SDK. AI Gateway handles provider routing, retries, and failover automatically.

Does Gemma 4 26B A4B IT support function-calling and structured output?

Yes. It supports function-calling for agentic workflows, structured JSON output, and system instructions natively, sharing these capabilities with the Gemini 3 architecture it is built on.

Dashboard

Gemma 4 26B A4B IT

Gemma 4 26B A4B IT is Google's open-weight mixture-of-experts model with 26B total parameters and roughly 4B active per forward pass. Built on the Gemini 3 architecture, it supports function-calling, structured JSON output, native vision, and 140+ languages within a context window of 262.1K tokens.

Vision (Image)Tool UseFile Input

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'google/gemma-4-26b-a4b-it',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out Gemma 4 26B A4B IT by Google. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

262K

2.2s

37tps

$0.13/M

$0.40/M

—

04/02/2026

Legal:Terms

•

Privacy

262K

0.5s

65tps

$0.13/M

$0.40/M

—

04/02/2026

Legal:Terms

•

Privacy

262K

0.9s

53tps

$0.15/M

$0.60/M

Read:$0.01/M

Write:—

—

04/02/2026

More models by Google

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

3.9s

273tps

$1.50/M

$9.00/M

Read:$0.15/M

Write:—

$14.00/K

+ input costs

—

05/19/2026

0.7s

245tps

$0.25/M

$1.50/M

Read:$0.03/M

Write:—

$14.00/K

+ input costs

—

03/03/2026

5.8s

174tps

$2.00/M

$12.00/M

Read:

$0.2/M

Write:

—

$14.00/K

+ input costs

—

02/19/2026

0.6s

180tps

$0.50/M

$3.00/M

Read:

$0.05/M

Write:

—

$14.00/K

+ input costs

—

12/17/2025

0.3s

194tps

$0.10/M

$0.40/M

Read:$0.01/M

Write:—

$35.00/K

+ input costs

—

06/17/2025

0.4s

178tps

$0.30/M

$2.50/M

Read:$0.03/M

Write:—

$35.00/K

+ input costs

—

03/20/2025

About Gemma 4 26B A4B IT

Gemma 4 26B A4B IT is part of Google's Gemma 4 family, the open-weight counterpart to the proprietary Gemini lineup. Google released it on April 2, 2026 as an instruction-tuned mixture-of-experts (MoE) model built on the same architecture as Gemini 3.

The MoE design is the defining characteristic. Of the 26B total parameters, only roughly 4B are active during any single forward pass. A routing mechanism selects which expert sub-networks to activate for each input, so Gemma 4 26B A4B IT achieves quality comparable to a much larger dense model while using a fraction of the compute per token. This translates to lower latency and higher throughput. See live metrics on this page for current throughput.

Gemma 4 26B A4B IT accepts text and image inputs within a context window of 262.1K tokens and supports over 140 languages. It handles function-calling, agentic workflows, structured JSON output, and system instructions natively. The instruction-tuning (indicated by the it suffix) means Gemma 4 26B A4B IT is ready for conversational and task-oriented use out of the box.

Running Gemma 4 26B A4B IT through AI Gateway provides unified billing, observability, automatic retries, and provider failover without requiring infrastructure management.

What To Consider When Choosing a Provider

Configuration: Evaluate whether the MoE architecture's latency and throughput characteristics fit your workload before selecting a provider variant at production scale.
Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Gemma 4 26B A4B IT

Best For

Latency-sensitive production workloads: The MoE architecture's lower compute-per-token translates to faster response times
Cost-efficient agentic pipelines: Need function-calling and structured output at high request volumes
Multilingual applications: Serving users across 140+ languages with a single model
Vision-language tasks: Image understanding, visual Q&A, and document analysis within a context window of 262.1K tokens
Open-weight workloads: The ability to inspect model weights matters

Consider Alternatives When

Highest output quality needed: Latency is not a constraint, and the dense Gemma 4 31B or a Gemini model may be more appropriate
Native image or audio generation: Your task requires media output, which Gemma 4 26B A4B IT does not support
Simple classification or extraction: A smaller, cheaper model is sufficient for straightforward workloads

Conclusion

Gemma 4 26B A4B IT provides Gemini 3-class capabilities in an open-weight package optimized for throughput. The MoE architecture keeps inference fast and affordable. For teams that need strong multilingual, multimodal reasoning without proprietary lock-in, it is a practical production choice on AI Gateway.

Frequently Asked Questions

What does mixture-of-experts mean for Gemma 4 26B A4B IT?
Gemma 4 26B A4B IT has 26B total parameters split across expert sub-networks. A routing mechanism activates roughly 4B parameters per forward pass, selecting the most relevant experts for each input. This reduces compute per token compared to a dense model of equivalent total size.
How does Gemma 4 26B A4B IT compare to the dense Gemma 4 31B?
Gemma 4 26B A4B IT prioritizes latency and throughput by activating fewer parameters per token. The dense Gemma 4 31B activates all 31B parameters, targeting higher output quality at the cost of more compute. Choose Gemma 4 26B A4B IT when speed matters and the dense variant when quality is the priority.
What input modalities does Gemma 4 26B A4B IT support?
Gemma 4 26B A4B IT accepts text and image inputs. It does not generate images or audio. Use it for text generation, visual understanding, and structured output tasks.
What languages does Gemma 4 26B A4B IT support?
Over 140 languages. The instruction-tuning covers multilingual conversational and task-oriented use cases.
How do I use Gemma 4 26B A4B IT on AI Gateway?
Set the model to google/gemma-4-26b-a4b-it in the AI SDK. AI Gateway handles provider routing, retries, and failover automatically.
Does Gemma 4 26B A4B IT support function-calling and structured output?
Yes. It supports function-calling for agentic workflows, structured JSON output, and system instructions natively, sharing these capabilities with the Gemini 3 architecture it is built on.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Gemma 4 26B A4B IT

Playground

Providers

More models by Google

About Gemma 4 26B A4B IT

What To Consider When Choosing a Provider

When to Use Gemma 4 26B A4B IT

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions