Mixtral MoE 8x22B Instruct
Mixtral MoE 8x22B Instruct is a sparse mixture-of-experts model with 141B total parameters and 39B active per forward pass, offering a context window of 65.5K tokens, native function calling, and Apache 2.0 licensing.
import { streamText } from 'ai'
const result = streamText({ model: 'mistral/mixtral-8x22b-instruct', prompt: 'Why is the sky blue?'})Frequently Asked Questions
What is the architecture of Mixtral MoE 8x22B Instruct?
A sparse Mixture-of-Experts model with 8 expert networks, 141B total parameters, and 39B active parameters per forward pass.
What is the context window?
65.5K tokens.
Does Mixtral MoE 8x22B Instruct support function calling?
Yes. Native function calling is included in the instruct variant.
What are the math benchmark scores?
90.8% on GSM8K (maj@8) and 44.6% on MATH (maj@4).
What license covers Mixtral MoE 8x22B Instruct?
Apache 2.0, Mistral AI's most permissive open-source license, allowing commercial use and redistribution.
Why does Mixtral MoE 8x22B Instruct outperform dense 70B models in speed?
The sparse MoE architecture activates only 39B of 141B total parameters per token, giving it a throughput profile closer to a 39B dense model while drawing on a much larger parameter space.