Gemma 4 26B A4B IT is part of Google's Gemma 4 family, the open-weight counterpart to the proprietary Gemini lineup. Google released it on April 2, 2026 as an instruction-tuned mixture-of-experts (MoE) model built on the same architecture as Gemini 3.
The MoE design is the defining characteristic. Of the 26B total parameters, only roughly 4B are active during any single forward pass. A routing mechanism selects which expert sub-networks to activate for each input, so the model achieves quality comparable to a much larger dense model while using a fraction of the compute per token. This translates to lower latency and higher tokens-per-second throughput.
Gemma 4 26B A4B IT accepts text and image inputs within a context window of 262.1K tokens and supports over 140 languages. It handles function-calling, agentic workflows, structured JSON output, and system instructions natively. The instruction-tuning (indicated by the it suffix) means the model is ready for conversational and task-oriented use out of the box.
Running Gemma 4 26B A4B IT through AI Gateway provides unified billing, observability, automatic retries, and provider failover without requiring infrastructure management.