Gemma 4 26B A4B IT
Gemma 4 26B A4B IT is Google's open-weight mixture-of-experts model with 26B total parameters and roughly 4B active per forward pass. Built on the Gemini 3 architecture, it supports function-calling, structured JSON output, native vision, and 140+ languages within a context window of 262.1K tokens.
import { streamText } from 'ai'
const result = streamText({ model: 'google/gemma-4-26b-a4b-it', prompt: 'Why is the sky blue?'})Playground
Try out Gemma 4 26B A4B IT by Google. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
About Gemma 4 26B A4B IT
Gemma 4 26B A4B IT is part of Google's Gemma 4 family, the open-weight counterpart to the proprietary Gemini lineup. Google released it on April 2, 2026 as an instruction-tuned mixture-of-experts (MoE) model built on the same architecture as Gemini 3.
The MoE design is the defining characteristic. Of the 26B total parameters, only roughly 4B are active during any single forward pass. A routing mechanism selects which expert sub-networks to activate for each input, so the model achieves quality comparable to a much larger dense model while using a fraction of the compute per token. This translates to lower latency and higher tokens-per-second throughput.
Gemma 4 26B A4B IT accepts text and image inputs within a context window of 262.1K tokens and supports over 140 languages. It handles function-calling, agentic workflows, structured JSON output, and system instructions natively. The instruction-tuning (indicated by the it suffix) means the model is ready for conversational and task-oriented use out of the box.
Running Gemma 4 26B A4B IT through AI Gateway provides unified billing, observability, automatic retries, and provider failover without requiring infrastructure management.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by Google
| Model |
|---|
What To Consider When Choosing a Provider
- Configuration: Evaluate whether the MoE architecture's latency and throughput characteristics fit your workload before selecting a provider variant at production scale.
- Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Gemma 4 26B A4B IT
Best For
- Latency-sensitive production workloads: The MoE architecture's lower compute-per-token translates to faster response times
- Cost-efficient agentic pipelines: Need function-calling and structured output at high request volumes
- Multilingual applications: Serving users across 140+ languages with a single model
- Vision-language tasks: Image understanding, visual Q&A, and document analysis within a context window of 262.1K tokens
- Open-weight workloads: The ability to inspect model weights matters
Consider Alternatives When
- Highest output quality needed: Latency is not a constraint, and the dense Gemma 4 31B or a Gemini model may be more appropriate
- Native image or audio generation: Your task requires media output, which Gemma 4 26B A4B IT does not support
- Simple classification or extraction: A smaller, cheaper model is sufficient for straightforward workloads
Conclusion
Gemma 4 26B A4B IT provides Gemini 3-class capabilities in an open-weight package optimized for throughput. The MoE architecture keeps inference fast and affordable. For teams that need strong multilingual, multimodal reasoning without proprietary lock-in, it is a practical production choice on AI Gateway.
Frequently Asked Questions
What does mixture-of-experts mean for Gemma 4 26B A4B IT?
Gemma 4 26B A4B IT has 26B total parameters split across expert sub-networks. A routing mechanism activates roughly 4B parameters per forward pass, selecting the most relevant experts for each input. This reduces compute per token compared to a dense model of equivalent total size.
How does Gemma 4 26B A4B IT compare to the dense Gemma 4 31B?
Gemma 4 26B A4B IT prioritizes latency and throughput by activating fewer parameters per token. The dense Gemma 4 31B activates all 31B parameters, targeting higher output quality at the cost of more compute. Choose Gemma 4 26B A4B IT when speed matters and the dense variant when quality is the priority.
What input modalities does Gemma 4 26B A4B IT support?
Gemma 4 26B A4B IT accepts text and image inputs. It does not generate images or audio. Use it for text generation, visual understanding, and structured output tasks.
What languages does Gemma 4 26B A4B IT support?
Over 140 languages. The instruction-tuning covers multilingual conversational and task-oriented use cases.
How do I use Gemma 4 26B A4B IT on AI Gateway?
Set the model to
google/gemma-4-26b-a4b-itin the AI SDK. AI Gateway handles provider routing, retries, and failover automatically.Does Gemma 4 26B A4B IT support function-calling and structured output?
Yes. It supports function-calling for agentic workflows, structured JSON output, and system instructions natively, sharing these capabilities with the Gemini 3 architecture it is built on.