What is the Tekken tokenizer?

Tekken is a tokenizer trained on 100+ languages, introduced with Mistral Nemo. Tekken achieves ~30% better source code compression, 2x better compression for Korean, and 3x better compression for Arabic compared to previous Mistral AI tokenizers.

Is Mistral Nemo a drop-in replacement for Mistral 7B?

Yes. Mistral AI positions it as a drop-in upgrade with the same architecture family, improved quality, and a larger context window.

What is FP8 inference and how does quantization-aware training help?

FP8 is a reduced-precision number format that speeds inference and reduces memory usage. Quantization-aware training means the model was trained to tolerate FP8 quantization, so accuracy doesn't degrade compared to full-precision inference.

What is the license for Mistral Nemo?

Apache 2.0, permitting commercial use and modification.

Who built Mistral Nemo?

Mistral AI in collaboration with NVIDIA, as indicated by the NeMo branding aligned with NVIDIA's NeMo framework ecosystem.

Dashboard

Mistral Nemo

Mistral Nemo is a 12B model with a context window of 131.1K tokens and the Tekken tokenizer trained on 100+ languages, offering ~30% better source code compression and improved multilingual efficiency as a drop-in replacement for Mistral 7B.

Tool Use

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'mistral/mistral-nemo',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out Mistral Nemo by Mistral AI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

60K

0.7s

58tps

$0.04/M

$0.17/M

—

07/01/2024

Legal:Terms

•

Privacy

128K

0.2s

112tps

$0.15/M

—

07/01/2024

Legal:Terms

•

Privacy

131K

0.3s

53tps

$0.02/M

$0.04/M

—

07/01/2024

More models by Mistral AI

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

256K

0.3s

63tps

$0.40/M

$2.00/M

—

12/09/2025

256K

0.2s

58tps

$0.20/M

—

12/01/2025

128K

0.8s

58tps

$0.40/M

$2.00/M

—

05/07/2025

128K

0.2s

171tps

$0.10/M

—

10/01/2024

32K

0.4s

$0.10/M

$0.30/M

—

09/01/2024

$0.10/M

—

12/11/2023

About Mistral Nemo

Released July 1, 2024, Mistral Nemo was built in collaboration with NVIDIA and introduced the Tekken tokenizer, trained across 100+ languages, as its defining technical innovation. Tekken achieves ~30% better compression for source code compared to previous Mistral AI tokenizers, 2x better compression for Korean, and 3x better compression for Arabic. These compression gains directly reduce token consumption and cost.

At 12B parameters with a context window of 131.1K tokens, Mistral Nemo serves as a drop-in replacement for Mistral 7B. Mistral Nemo provides enhanced instruction following, multi-turn conversation quality, and code generation. Quantization-aware training enables FP8 inference without performance degradation. The combination of quantization awareness and Tekken compression gives Mistral Nemo deployment efficiency advantages.

Mistral Nemo's multilingual coverage spans English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. Mistral Nemo is available under Apache 2.0, with both base and instruct weights on HuggingFace.

What To Consider When Choosing a Provider

Configuration: The Tekken tokenizer's compression efficiency means that for code-heavy or non-Latin-script workloads, you use fewer tokens per request than with models using conventional tokenizers.
Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Mistral Nemo

Best For

Multilingual applications: Spanning European, East Asian, and Arabic-script languages
Code-heavy workloads: Tekken's 30% compression advantage reduces token costs
Korean or Arabic applications: 2-3x tokenizer compression is significant
Mistral 7B migrations: Deployments that need a larger context window

Consider Alternatives When

Larger general-purpose headroom: You need more capacity (consider Mistral AI Large 3)
Code generation or agentic coding: Coding is the primary task (consider Devstral or Codestral)
Higher reasoning depth: Tasks that require deeper reasoning traces (consider Magistral models)

Conclusion

Mistral Nemo's Tekken tokenizer is its distinguishing technical contribution, delivering efficiency gains for code and non-Latin-script languages that translate into lower costs per task. For multilingual applications and code-heavy pipelines, those gains compound at scale.

Frequently Asked Questions

What is the Tekken tokenizer?
Tekken is a tokenizer trained on 100+ languages, introduced with Mistral Nemo. Tekken achieves ~30% better source code compression, 2x better compression for Korean, and 3x better compression for Arabic compared to previous Mistral AI tokenizers.
What is the context window for Mistral Nemo?
131.1K tokens.
Is Mistral Nemo a drop-in replacement for Mistral 7B?
Yes. Mistral AI positions it as a drop-in upgrade with the same architecture family, improved quality, and a larger context window.
What languages does Mistral Nemo support?
English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi, among others.
What is FP8 inference and how does quantization-aware training help?
FP8 is a reduced-precision number format that speeds inference and reduces memory usage. Quantization-aware training means the model was trained to tolerate FP8 quantization, so accuracy doesn't degrade compared to full-precision inference.
What is the license for Mistral Nemo?
Apache 2.0, permitting commercial use and modification.
Who built Mistral Nemo?
Mistral AI in collaboration with NVIDIA, as indicated by the NeMo branding aligned with NVIDIA's NeMo framework ecosystem.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Mistral Nemo

Playground

Providers

More models by Mistral AI

About Mistral Nemo

What To Consider When Choosing a Provider

When to Use Mistral Nemo

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions