What is the architecture of Mixtral MoE 8x22B Instruct?

A sparse Mixture-of-Experts model with 8 expert networks, 141B total parameters, and 39B active parameters per forward pass.

Does Mixtral MoE 8x22B Instruct support function calling?

Yes. Native function calling is included in the instruct variant.

What are the math benchmark scores?

90.8% on GSM8K (maj@8) and 44.6% on MATH (maj@4).

What license covers Mixtral MoE 8x22B Instruct?

Apache 2.0, Mistral AI's most permissive open-source license, allowing commercial use and redistribution.

Why does Mixtral MoE 8x22B Instruct outperform dense 70B models in speed?

The sparse MoE architecture activates only 39B of 141B total parameters per token, giving it a throughput profile closer to a 39B dense model while drawing on a much larger parameter space.

Mixtral MoE 8x22B Instruct

Mixtral MoE 8x22B Instruct is a sparse mixture-of-experts model with 141B total parameters and 39B active per forward pass, offering a context window of 65.5K tokens, native function calling, and Apache 2.0 licensing.

import { streamText } from 'ai'

const result = streamText({
  model: 'mistral/mixtral-8x22b-instruct',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Frequently Asked Questions

What is the architecture of Mixtral MoE 8x22B Instruct?
A sparse Mixture-of-Experts model with 8 expert networks, 141B total parameters, and 39B active parameters per forward pass.
What is the context window?
65.5K tokens.
Does Mixtral MoE 8x22B Instruct support function calling?
Yes. Native function calling is included in the instruct variant.
What are the math benchmark scores?
90.8% on GSM8K (maj@8) and 44.6% on MATH (maj@4).
What license covers Mixtral MoE 8x22B Instruct?
Apache 2.0, Mistral AI's most permissive open-source license, allowing commercial use and redistribution.
Why does Mixtral MoE 8x22B Instruct outperform dense 70B models in speed?
The sparse MoE architecture activates only 39B of 141B total parameters per token, giving it a throughput profile closer to a 39B dense model while drawing on a much larger parameter space.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Mixtral MoE 8x22B Instruct

Frequently Asked Questions