Mixtral MoE 8x22B Instruct
Mixtral MoE 8x22B Instruct is a sparse mixture-of-experts model with 141B total parameters and 39B active per forward pass, offering a context window of 65.5K tokens, native function calling, and Apache 2.0 licensing.
import { streamText } from 'ai'
const result = streamText({ model: 'mistral/mixtral-8x22b-instruct', prompt: 'Why is the sky blue?'})Playground
Try out Mixtral MoE 8x22B Instruct by Mistral AI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
About Mixtral MoE 8x22B Instruct
Released April 17, 2024, Mixtral MoE 8x22B Instruct is Mistral AI's largest open-weight model. Mixtral MoE 8x22B Instruct uses a sparse Mixture-of-Experts (SMoE) architecture with eight expert networks totaling 141B parameters, of which 39B activate per forward pass. This architecture lets Mixtral MoE 8x22B Instruct outperform dense 70B models in inference speed while drawing on a larger total parameter space for complex reasoning.
The instruct variant adds native function calling and a constrained output mode on La Plateforme, both important for structured, agentic applications. Multilingual fluency covers English, French, Italian, German, and Spanish. Mixtral MoE 8x22B Instruct scores 90.8% on GSM8K (maj@8) and 44.6% on the MATH benchmark (maj@4).
Released under Apache 2.0, Mistral AI's most permissive license, Mixtral MoE 8x22B Instruct can be used commercially, modified, and redistributed without restriction.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by Mistral AI
| Model |
|---|
What To Consider When Choosing a Provider
- Configuration: The context window of 65.5K tokens enables Mixtral MoE 8x22B Instruct to process large documents, multi-file codebases, or long conversation histories in a single request, well suited for applications that regularly hit context limits on smaller models.
- Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Mixtral MoE 8x22B Instruct
Best For
- Complex multilingual reasoning: Across English, French, Italian, German, and Spanish
- Native function calling: Applications requiring structured output constraints
- Long-document processing: Multi-file analysis within the context window of 65.5K tokens
- Mathematics and quantitative reasoning: Quantitative problem solving at the 90.8% GSM8K benchmark level
- Apache 2.0 open-weight licensing: Teams that need this for a large, capable model
Consider Alternatives When
- Newer large-scale MoE: You want Mistral AI Large 3's 675B total / 41B active architecture
- Explicit reasoning traces: Your workload requires step-by-step reasoning output (consider Magistral models)
- Vision capabilities: Your workload requires image input (consider Pixtral Large)
Conclusion
Mixtral MoE 8x22B Instruct combines 141B total parameters, 39B active per forward pass, Apache 2.0 licensing, and native function calling. Mixtral MoE 8x22B Instruct delivers faster inference than dense 70B models at this capability tier and remains an option for teams that need a large open-weight model with commercial-friendly terms.
Frequently Asked Questions
What is the architecture of Mixtral MoE 8x22B Instruct?
A sparse Mixture-of-Experts model with 8 expert networks, 141B total parameters, and 39B active parameters per forward pass.
What is the context window?
65.5K tokens.
Does Mixtral MoE 8x22B Instruct support function calling?
Yes. Native function calling is included in the instruct variant.
What are the math benchmark scores?
90.8% on GSM8K (maj@8) and 44.6% on MATH (maj@4).
What license covers Mixtral MoE 8x22B Instruct?
Apache 2.0, Mistral AI's most permissive open-source license, allowing commercial use and redistribution.
Why does Mixtral MoE 8x22B Instruct outperform dense 70B models in speed?
The sparse MoE architecture activates only 39B of 141B total parameters per token, giving it a throughput profile closer to a 39B dense model while drawing on a much larger parameter space.