Skip to content

Mixtral MoE 8x22B Instruct

Mixtral MoE 8x22B Instruct is a sparse mixture-of-experts model with 141B total parameters and 39B active per forward pass, offering a context window of 65.5K tokens, native function calling, and Apache 2.0 licensing.

index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'mistral/mixtral-8x22b-instruct',
prompt: 'Why is the sky blue?'
})

Frequently Asked Questions

  • What is the architecture of Mixtral MoE 8x22B Instruct?

    A sparse Mixture-of-Experts model with 8 expert networks, 141B total parameters, and 39B active parameters per forward pass.

  • What is the context window?

    65.5K tokens.

  • Does Mixtral MoE 8x22B Instruct support function calling?

    Yes. Native function calling is included in the instruct variant.

  • What are the math benchmark scores?

    90.8% on GSM8K (maj@8) and 44.6% on MATH (maj@4).

  • What license covers Mixtral MoE 8x22B Instruct?

    Apache 2.0, Mistral AI's most permissive open-source license, allowing commercial use and redistribution.

  • Why does Mixtral MoE 8x22B Instruct outperform dense 70B models in speed?

    The sparse MoE architecture activates only 39B of 141B total parameters per token, giving it a throughput profile closer to a 39B dense model while drawing on a much larger parameter space.