Skip to content

Mistral Nemo

Mistral Nemo is a 12B model with a context window of 131.1K tokens and the Tekken tokenizer trained on 100+ languages, offering ~30% better source code compression and improved multilingual efficiency as a drop-in replacement for Mistral 7B.

Tool Use
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'mistral/mistral-nemo',
prompt: 'Why is the sky blue?'
})

Playground

Try out Mistral Nemo by Mistral AI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Novita AI
Legal:Terms
Privacy
60K
0.7s
58tps
$0.04/M$0.17/M
07/01/2024
Mistral AI
Legal:Terms
Privacy
128K
0.2s
112tps
$0.15/M$0.15/M
07/01/2024
DeepInfra
Legal:Terms
Privacy
131K
0.3s
53tps
$0.02/M$0.04/M
07/01/2024
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by Mistral AI

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
256K
0.3s
63tps
$0.40/M$2.00/M
mistral logo
12/09/2025
256K
0.2s
58tps
$0.20/M$0.20/M
mistral logo
12/01/2025
128K
0.8s
58tps
$0.40/M$2.00/M
mistral logo
05/07/2025
128K
0.2s
171tps
$0.10/M$0.10/M
mistral logo
10/01/2024
32K
0.4s
$0.10/M$0.30/M
mistral logo
09/01/2024
$0.10/M
mistral logo
12/11/2023

About Mistral Nemo

Released July 1, 2024, Mistral Nemo was built in collaboration with NVIDIA and introduced the Tekken tokenizer, trained across 100+ languages, as its defining technical innovation. Tekken achieves ~30% better compression for source code compared to previous Mistral AI tokenizers, 2x better compression for Korean, and 3x better compression for Arabic. These compression gains directly reduce token consumption and cost.

At 12B parameters with a context window of 131.1K tokens, Mistral Nemo serves as a drop-in replacement for Mistral 7B. Mistral Nemo provides enhanced instruction following, multi-turn conversation quality, and code generation. Quantization-aware training enables FP8 inference without performance degradation. The combination of quantization awareness and Tekken compression gives Mistral Nemo deployment efficiency advantages.

Mistral Nemo's multilingual coverage spans English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. Mistral Nemo is available under Apache 2.0, with both base and instruct weights on HuggingFace.

What To Consider When Choosing a Provider

  • Configuration: The Tekken tokenizer's compression efficiency means that for code-heavy or non-Latin-script workloads, you use fewer tokens per request than with models using conventional tokenizers.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Mistral Nemo

Best For

  • Multilingual applications: Spanning European, East Asian, and Arabic-script languages
  • Code-heavy workloads: Tekken's 30% compression advantage reduces token costs
  • Korean or Arabic applications: 2-3x tokenizer compression is significant
  • Mistral 7B migrations: Deployments that need a larger context window

Consider Alternatives When

  • Larger general-purpose headroom: You need more capacity (consider Mistral AI Large 3)
  • Code generation or agentic coding: Coding is the primary task (consider Devstral or Codestral)
  • Higher reasoning depth: Tasks that require deeper reasoning traces (consider Magistral models)

Conclusion

Mistral Nemo's Tekken tokenizer is its distinguishing technical contribution, delivering efficiency gains for code and non-Latin-script languages that translate into lower costs per task. For multilingual applications and code-heavy pipelines, those gains compound at scale.

Frequently Asked Questions

  • What is the Tekken tokenizer?

    Tekken is a tokenizer trained on 100+ languages, introduced with Mistral Nemo. Tekken achieves ~30% better source code compression, 2x better compression for Korean, and 3x better compression for Arabic compared to previous Mistral AI tokenizers.

  • What is the context window for Mistral Nemo?

    131.1K tokens.

  • Is Mistral Nemo a drop-in replacement for Mistral 7B?

    Yes. Mistral AI positions it as a drop-in upgrade with the same architecture family, improved quality, and a larger context window.

  • What languages does Mistral Nemo support?

    English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi, among others.

  • What is FP8 inference and how does quantization-aware training help?

    FP8 is a reduced-precision number format that speeds inference and reduces memory usage. Quantization-aware training means the model was trained to tolerate FP8 quantization, so accuracy doesn't degrade compared to full-precision inference.

  • What is the license for Mistral Nemo?

    Apache 2.0, permitting commercial use and modification.

  • Who built Mistral Nemo?

    Mistral AI in collaboration with NVIDIA, as indicated by the NeMo branding aligned with NVIDIA's NeMo framework ecosystem.