Skip to content
Dashboard

Gemma 4 26B A4B IT

Gemma 4 26B A4B IT is Google's open-weight mixture-of-experts model with 26B total parameters and roughly 4B active per forward pass. Built on the Gemini 3 architecture, it supports function-calling, structured JSON output, native vision, and 140+ languages within a context window of 262.1K tokens.

Vision (Image)Tool UseFile Input
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'google/gemma-4-26b-a4b-it',
prompt: 'Why is the sky blue?'
})

Playground

Try out Gemma 4 26B A4B IT by Google. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

google logo
google logo

Ask Gemma 4 26B A4B IT anything to try it out.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Novita AI
262K
1.2s
37tps
$0.13/M$0.40/M——
+1
04/02/2026
Parasail
262K
0.7s
60tps
$0.13/M$0.40/M——
04/02/2026
Google Vertex AI
262K
0.6s
93tps
$0.15/M$0.60/M
Read:$0.01/M
Write:—
——
+2
04/02/2026
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by Google

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
2.2s
174tps
$1.50/M$9.00/M
Read:$0.15/M
Write:—
$14.00/K
+ input costs
—
+4
google logo
vertex logo
05/19/2026
1M
0.4s
230tps
$0.25/M$1.50/M
Read:$0.03/M
Write:—
$14.00/K
+ input costs
—
+4
google logo
vertex logo
03/03/2026
1M
3.3s
102tps
$2.00/M
$12.00/M
Read:
$0.2/M
Write:
—
$14.00/K
+ input costs
—
+4
google logo
vertex logo
02/19/2026
1M
0.7s
155tps
$0.50/M
$3.00/M
Read:
$0.05/M
Write:
—
$14.00/K
+ input costs
—
+4
google logo
vertex logo
12/17/2025
1M
0.3s
256tps
$0.10/M$0.40/M
Read:$0.01/M
Write:—
$35.00/K
+ input costs
—
+4
google logo
vertex logo
06/17/2025
1M
1.8s
132tps
$1.25/M
$10.00/M
Read:
$0.13/M
Write:
—
$35.00/K
+ input costs
—
+4
google logo
vertex logo
03/20/2025

About Gemma 4 26B A4B IT

Gemma 4 26B A4B IT is part of Google's Gemma 4 family, the open-weight counterpart to the proprietary Gemini lineup. Google released it on April 2, 2026 as an instruction-tuned mixture-of-experts (MoE) model built on the same architecture as Gemini 3.

The MoE design is the defining characteristic. Of the 26B total parameters, only roughly 4B are active during any single forward pass. A routing mechanism selects which expert sub-networks to activate for each input, so Gemma 4 26B A4B IT achieves quality comparable to a much larger dense model while using a fraction of the compute per token. This translates to lower latency and higher throughput. See live metrics on this page for current throughput.

Gemma 4 26B A4B IT accepts text and image inputs within a context window of 262.1K tokens and supports over 140 languages. It handles function-calling, agentic workflows, structured JSON output, and system instructions natively. The instruction-tuning (indicated by the it suffix) means Gemma 4 26B A4B IT is ready for conversational and task-oriented use out of the box.

Running Gemma 4 26B A4B IT through AI Gateway provides unified billing, observability, automatic retries, and provider failover without requiring infrastructure management.

What To Consider When Choosing a Provider

  • Configuration: Evaluate whether the MoE architecture's latency and throughput characteristics fit your workload before selecting a provider variant at production scale.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Gemma 4 26B A4B IT

Best For

  • Latency-sensitive production workloads: The MoE architecture's lower compute-per-token translates to faster response times
  • Cost-efficient agentic pipelines: Need function-calling and structured output at high request volumes
  • Multilingual applications: Serving users across 140+ languages with a single model
  • Vision-language tasks: Image understanding, visual Q&A, and document analysis within a context window of 262.1K tokens
  • Open-weight workloads: The ability to inspect model weights matters

Consider Alternatives When

  • Highest output quality needed: Latency is not a constraint, and the dense Gemma 4 31B or a Gemini model may be more appropriate
  • Native image or audio generation: Your task requires media output, which Gemma 4 26B A4B IT does not support
  • Simple classification or extraction: A smaller, cheaper model is sufficient for straightforward workloads

Conclusion

Gemma 4 26B A4B IT provides Gemini 3-class capabilities in an open-weight package optimized for throughput. The MoE architecture keeps inference fast and affordable. For teams that need strong multilingual, multimodal reasoning without proprietary lock-in, it is a practical production choice on AI Gateway.