Mistral Large 3
Mistral Large 3 is a large-scale MoE model from Mistral AI, using a sparse mixture-of-experts architecture with 41B active parameters out of 675B total, the company's first MoE release since the Mixtral series.
import { streamText } from 'ai'
const result = streamText({ model: 'mistral/mistral-large-3', prompt: 'Why is the sky blue?'})What To Consider When Choosing a Provider
Zero Data Retention
AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.Authentication
AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
Mistral Large 3's return to MoE architecture brings sparse activation, where only part of the total parameters run per token, to Mistral AI's largest general-purpose open release as of the Mistral 3 announcement.
When to Use Mistral Large 3
Best For
High-capability MoE tasks:
Demanding Mistral AI's general-purpose MoE lineup
Complex reasoning and analysis:
Tasks that benefit from a large total parameter pool
Long-form content generation:
Long outputs where coherent multi-step logic has to hold across the whole piece
Mistral AI ecosystem fit:
Applications that rely on its tooling, fine-tuning, or enterprise agreements
MoE inference efficiency:
Workflows preferred over pure dense-model approaches
Consider Alternatives When
Explicit chain-of-thought reasoning:
Your task requires reasoning traces (consider Magistral Medium)
Primary cost constraint:
Mistral Small or a Ministral variant meets accuracy requirements at lower per-token cost than the 675B flagship
Vision capabilities:
You need multimodal input (consider Pixtral Large)
Conclusion
Mistral Large 3 brings back sparse MoE at a larger scale than Mixtral. For teams that want Mistral AI's largest general-purpose open MoE with 41B active parameters per forward pass, it fills that tier.
FAQ
A sparse mixture-of-experts (MoE) model with 675B total parameters and 41B active per forward pass.
No. Mistral AI describes Mistral Large 3 as the company's first MoE model since the Mixtral series, returning to sparse architecture at a larger scale.
December 2, 2025.
Only 41B of 675B total parameters activate per forward pass, so inference costs stay closer to a 41B dense model than a 675B dense model.
Yes. AI Gateway supports Bring Your Own Key (BYOK) configuration. Yes, Zero Data Retention is available for this model. ZDR on AI Gateway applies to direct gateway requests; BYOK flows aren't covered. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.
Mistral Large 3 is the general-purpose MoE model in Mistral AI's lineup. Magistral Medium is a reasoning model with traceable chain-of-thought and published AIME 2024 scores. Pick Magistral Medium when you need explicit reasoning traces; pick Mistral Large 3 for general tasks without that requirement.
Request-level cost, latency, token counts, and provider routing decisions, without additional instrumentation.