Skip to content
Vercel April 2026 security incident

Mixtral MoE 8x22B Instruct

mistral/mixtral-8x22b-instruct

Mixtral MoE 8x22B Instruct is a sparse mixture-of-experts model with 141B total parameters and 39B active per forward pass, offering a context window of 65.5K tokens, native function calling, and Apache 2.0 licensing.

index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'mistral/mixtral-8x22b-instruct',
prompt: 'Why is the sky blue?'
})

What To Consider When Choosing a Provider

  • Zero Data Retention

    AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.

    Authentication

    AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

The context window of 65.5K tokens enables Mixtral MoE 8x22B Instruct to process large documents, multi-file codebases, or long conversation histories in a single request, well suited for applications that regularly hit context limits on smaller models.

When to Use Mixtral MoE 8x22B Instruct

Best For

  • Complex multilingual reasoning:

    Across English, French, Italian, German, and Spanish

  • Native function calling:

    Applications requiring structured output constraints

  • Long-document processing:

    Multi-file analysis within the context window of 65.5K tokens

  • Mathematics and quantitative reasoning:

    Quantitative problem solving at the 90.8% GSM8K benchmark level

  • Apache 2.0 open-weight licensing:

    Teams that need this for a large, capable model

Consider Alternatives When

  • Newer large-scale MoE:

    You want Mistral AI Large 3's 675B total / 41B active architecture

  • Explicit reasoning traces:

    Your workload requires step-by-step reasoning output (consider Magistral models)

  • Vision capabilities:

    Your workload requires image input (consider Pixtral Large)

Conclusion

Mixtral MoE 8x22B Instruct combines 141B total parameters, 39B active per forward pass, Apache 2.0 licensing, and native function calling. Mixtral MoE 8x22B Instruct delivers faster inference than dense 70B models at this capability tier and remains an option for teams that need a large open-weight model with commercial-friendly terms.

FAQ

A sparse Mixture-of-Experts model with 8 expert networks, 141B total parameters, and 39B active parameters per forward pass.

65.5K tokens.

Yes. Native function calling is included in the instruct variant.

90.8% on GSM8K (maj@8) and 44.6% on MATH (maj@4).

Apache 2.0, Mistral AI's most permissive open-source license, allowing commercial use and redistribution.

The sparse MoE architecture activates only 39B of 141B total parameters per token, giving it a throughput profile closer to a 39B dense model while drawing on a much larger parameter space.