What is Mistral Large 3's architecture?

A sparse mixture-of-experts (MoE) model with 675B total parameters and 41B active per forward pass.

Is this the first Mistral AI MoE model?

No. Mistral AI describes Mistral Large 3 as the company's first MoE model since the Mixtral series, returning to sparse architecture at a larger scale.

How does the MoE architecture affect inference cost?

Only 41B of 675B total parameters activate per forward pass, so inference costs stay closer to a 41B dense model than a 675B dense model.

Does AI Gateway support BYOK for Mistral Large 3?

Yes. AI Gateway supports Bring Your Own Key (BYOK) configuration. Yes, Zero Data Retention is available for this model. ZDR on AI Gateway applies to direct gateway requests; BYOK flows aren't covered. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.

How does Mistral Large 3 compare to Magistral Medium?

Mistral Large 3 is the general-purpose MoE model in Mistral AI's lineup. Magistral Medium is a reasoning model with traceable chain-of-thought and published AIME 2024 scores. Pick Magistral Medium when you need explicit reasoning traces; pick Mistral Large 3 for general tasks without that requirement.

What observability does AI Gateway provide for Mistral Large 3?

Request-level cost, latency, token counts, and provider routing decisions, without additional instrumentation.

Mistral Large 3

Mistral Large 3 is a large-scale MoE model from Mistral AI, using a sparse mixture-of-experts architecture with 41B active parameters out of 675B total, the company's first MoE release since the Mixtral series.

Vision (Image)

import { streamText } from 'ai'

const result = streamText({
  model: 'mistral/mistral-large-3',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Frequently Asked Questions

What is Mistral Large 3's architecture?
A sparse mixture-of-experts (MoE) model with 675B total parameters and 41B active per forward pass.
Is this the first Mistral AI MoE model?
No. Mistral AI describes Mistral Large 3 as the company's first MoE model since the Mixtral series, returning to sparse architecture at a larger scale.
When was Mistral Large 3 added to AI Gateway?
December 2, 2025.
How does the MoE architecture affect inference cost?
Only 41B of 675B total parameters activate per forward pass, so inference costs stay closer to a 41B dense model than a 675B dense model.
Does AI Gateway support BYOK for Mistral Large 3?
Yes. AI Gateway supports Bring Your Own Key (BYOK) configuration. Yes, Zero Data Retention is available for this model. ZDR on AI Gateway applies to direct gateway requests; BYOK flows aren't covered. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.
How does Mistral Large 3 compare to Magistral Medium?
Mistral Large 3 is the general-purpose MoE model in Mistral AI's lineup. Magistral Medium is a reasoning model with traceable chain-of-thought and published AIME 2024 scores. Pick Magistral Medium when you need explicit reasoning traces; pick Mistral Large 3 for general tasks without that requirement.
What observability does AI Gateway provide for Mistral Large 3?
Request-level cost, latency, token counts, and provider routing decisions, without additional instrumentation.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Mistral Large 3

Frequently Asked Questions