Mixtral MoE 8x22B Instruct
Mixtral MoE 8x22B Instruct is a sparse mixture-of-experts model with 141B total parameters and 39B active per forward pass, offering a context window of 65.5K tokens, native function calling, and Apache 2.0 licensing.
import { streamText } from 'ai'
const result = streamText({ model: 'mistral/mixtral-8x22b-instruct', prompt: 'Why is the sky blue?'})What To Consider When Choosing a Provider
Zero Data Retention
AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.Authentication
AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
The context window of 65.5K tokens enables Mixtral MoE 8x22B Instruct to process large documents, multi-file codebases, or long conversation histories in a single request, well suited for applications that regularly hit context limits on smaller models.
When to Use Mixtral MoE 8x22B Instruct
Best For
Complex multilingual reasoning:
Across English, French, Italian, German, and Spanish
Native function calling:
Applications requiring structured output constraints
Long-document processing:
Multi-file analysis within the context window of 65.5K tokens
Mathematics and quantitative reasoning:
Quantitative problem solving at the 90.8% GSM8K benchmark level
Apache 2.0 open-weight licensing:
Teams that need this for a large, capable model
Consider Alternatives When
Newer large-scale MoE:
You want Mistral AI Large 3's 675B total / 41B active architecture
Explicit reasoning traces:
Your workload requires step-by-step reasoning output (consider Magistral models)
Vision capabilities:
Your workload requires image input (consider Pixtral Large)
Conclusion
Mixtral MoE 8x22B Instruct combines 141B total parameters, 39B active per forward pass, Apache 2.0 licensing, and native function calling. Mixtral MoE 8x22B Instruct delivers faster inference than dense 70B models at this capability tier and remains an option for teams that need a large open-weight model with commercial-friendly terms.
FAQ
A sparse Mixture-of-Experts model with 8 expert networks, 141B total parameters, and 39B active parameters per forward pass.
65.5K tokens.
Yes. Native function calling is included in the instruct variant.
90.8% on GSM8K (maj@8) and 44.6% on MATH (maj@4).
Apache 2.0, Mistral AI's most permissive open-source license, allowing commercial use and redistribution.
The sparse MoE architecture activates only 39B of 141B total parameters per token, giving it a throughput profile closer to a 39B dense model while drawing on a much larger parameter space.