Skip to content

Mistral Large 3

Mistral Large 3 is a large-scale MoE model from Mistral AI, using a sparse mixture-of-experts architecture with 41B active parameters out of 675B total, the company's first MoE release since the Mixtral series.

Vision (Image)
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'mistral/mistral-large-3',
prompt: 'Why is the sky blue?'
})

Playground

Try out Mistral Large 3 by Mistral AI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Mistral AI
Legal:Terms
Privacy
256K
0.4s
63tps
$0.50/M$1.50/M
12/02/2025
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by Mistral AI

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
256K
0.3s
56tps
$0.40/M$2.00/M
mistral logo
12/09/2025
256K
0.2s
118tps
$0.20/M$0.20/M
mistral logo
12/01/2025
128K
0.3s
81tps
$0.40/M$2.00/M
mistral logo
05/07/2025
128K
0.2s
264tps
$0.10/M$0.10/M
mistral logo
10/01/2024
32K
0.4s
138tps
$0.10/M$0.30/M
mistral logo
09/01/2024
$0.10/M
mistral logo
12/11/2023

About Mistral Large 3

Announced December 2, 2025, Mistral Large 3 marks Mistral AI's return to the mixture-of-experts (MoE) architecture that defined their earlier Mixtral series, now at a larger scale. With 675B total parameters and 41B active per forward pass, Mistral Large 3 represents a substantial architectural evolution from the dense models that preceded it in the Large lineage.

The sparse MoE design lets Mistral Large 3 maintain inference efficiency comparable to a smaller dense model while drawing on a large total parameter pool for complex tasks. This architecture offers a tradeoff between capability and inference cost.

Through AI Gateway, you can access Mistral Large 3 without separate Mistral AI API credentials. Built-in observability gives you cost and latency visibility across every request.

What To Consider When Choosing a Provider

  • Configuration: Mistral Large 3's return to MoE architecture brings sparse activation, where only part of the total parameters run per token, to Mistral AI's largest general-purpose open release as of the Mistral 3 announcement.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Mistral Large 3

Best For

  • High-capability MoE tasks: Demanding Mistral AI's general-purpose MoE lineup
  • Complex reasoning and analysis: Tasks that benefit from a large total parameter pool
  • Long-form content generation: Long outputs where coherent multi-step logic has to hold across the whole piece
  • Mistral AI ecosystem fit: Applications that rely on its tooling, fine-tuning, or enterprise agreements
  • MoE inference efficiency: Workflows preferred over pure dense-model approaches

Consider Alternatives When

  • Explicit chain-of-thought reasoning: Your task requires reasoning traces (consider Magistral Medium)
  • Primary cost constraint: Mistral Small or a Ministral variant meets accuracy requirements at lower per-token cost than the 675B flagship
  • Vision capabilities: You need multimodal input (consider Pixtral Large)

Conclusion

Mistral Large 3 brings back sparse MoE at a larger scale than Mixtral. For teams that want Mistral AI's largest general-purpose open MoE with 41B active parameters per forward pass, it fills that tier.

Frequently Asked Questions

  • What is Mistral Large 3's architecture?

    A sparse mixture-of-experts (MoE) model with 675B total parameters and 41B active per forward pass.

  • Is this the first Mistral AI MoE model?

    No. Mistral AI describes Mistral Large 3 as the company's first MoE model since the Mixtral series, returning to sparse architecture at a larger scale.

  • When was Mistral Large 3 added to AI Gateway?

    December 2, 2025.

  • How does the MoE architecture affect inference cost?

    Only 41B of 675B total parameters activate per forward pass, so inference costs stay closer to a 41B dense model than a 675B dense model.

  • Does AI Gateway support BYOK for Mistral Large 3?

    Yes. AI Gateway supports Bring Your Own Key (BYOK) configuration. Yes, Zero Data Retention is available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.

  • How does Mistral Large 3 compare to Magistral Medium?

    Mistral Large 3 is the general-purpose MoE model in Mistral AI's lineup. Magistral Medium is a reasoning model with traceable chain-of-thought and published AIME 2024 scores. Pick Magistral Medium when you need explicit reasoning traces; pick Mistral Large 3 for general tasks without that requirement.

  • What observability does AI Gateway provide for Mistral Large 3?

    Request-level cost, latency, token counts, and provider routing decisions, without additional instrumentation.