Skip to content

Pixtral 12B 2409

Pixtral 12B 2409 is a natively multimodal model with a 400M vision encoder and context window of 128K tokens, processing images at native resolution with support for multiple images per request.

Tool UseVision (Image)
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'mistral/pixtral-12b',
prompt: 'Why is the sky blue?'
})

Playground

Try out Pixtral 12B 2409 by Mistral AI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Mistral AI
Legal:Terms
Privacy
128K
0.2s
85tps
$0.15/M$0.15/M
09/01/2024
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by Mistral AI

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
256K
0.3s
75tps
$0.40/M$2.00/M
mistral logo
12/09/2025
256K
0.2s
83tps
$0.20/M$0.20/M
mistral logo
12/01/2025
128K
0.2s
197tps
$0.10/M$0.10/M
mistral logo
10/01/2024
32K
0.4s
123tps
$0.10/M$0.30/M
mistral logo
09/01/2024
131K
0.2s
123tps
$0.15/M$0.15/M
deepinfra logo
mistral logo
novita logo
07/01/2024
$0.10/M
mistral logo
12/11/2023

About Pixtral 12B 2409

Pixtral 12B 2409 introduced multimodal capability to the Mistral AI lineup with a clean architectural split. A 400M parameter vision encoder trained from scratch handles image understanding, while a 12B decoder based on Mistral AI Nemo handles text generation. The two components were trained together on interleaved image-text data, so visual and textual understanding are integrated rather than bolted together.

The context window of 128K tokens accommodates multiple images alongside text in a single request. You can compare images, trace visual changes across a document, or cross-reference diagrams with their written descriptions. Variable aspect ratio support processes images at their native dimensions, which matters for charts, documents, and technical schematics where distortion degrades accuracy.

On MMMU, Pixtral 12B 2409 scores 52.5% and achieves a 20% relative improvement in instruction following over comparable open-source multimodal models. Pixtral 12B 2409 ships under Apache 2.0. Mistral AI has designated Pixtral 12B 2409 as deprecated in favor of newer vision models, though it remains available through AI Gateway for existing integrations.

What To Consider When Choosing a Provider

  • Configuration: Pixtral 12B 2409 processes images at native resolution without token waste. Pixtral 12B 2409 adapts token allocation to actual image dimensions, unlike models that resize all images to a fixed token budget.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Pixtral 12B 2409

Best For

  • Document question answering: Text and visual layout matter together
  • Chart and graph interpretation: Requiring both visual parsing and textual explanation
  • Multi-image workflows: Within a single request using the full context of 128K tokens
  • Native multimodal migration: For applications moving from vision-adapted text models
  • Apache 2.0 vision licensing: Teams requiring this license for a vision-capable model

Consider Alternatives When

  • Higher vision accuracy: You need top-tier performance (consider Pixtral Large)
  • Text-only workloads: Vision capability adds no value
  • Higher math or reasoning: You need stronger benchmark scores alongside vision

Conclusion

Pixtral 12B 2409 brought native multimodality to the Mistral AI family as a ground-up integration of vision and language through a 400M encoder trained alongside the text decoder. Mistral AI has deprecated Pixtral 12B 2409. For teams with existing integrations, Pixtral 12B 2409 remains available through AI Gateway.

Frequently Asked Questions

  • Is Pixtral 12B 2409 still actively maintained?

    Mistral AI has designated Pixtral 12B 2409 as deprecated in favor of newer vision models. Pixtral 12B 2409 remains accessible through AI Gateway for existing integrations.

  • How does Pixtral 12B 2409 handle images at native resolution?

    Pixtral 12B 2409 dynamically allocates tokens based on actual image dimensions rather than resizing to a fixed budget. This preserves detail in high-resolution charts, documents, and schematics.

  • How many images can Pixtral 12B 2409 process in one request?

    Multiple images are supported within the context window of 128K tokens, limited by total token budget.

  • What is the vision encoder architecture?

    400 million parameters, trained from scratch on visual data, not adapted from a pre-existing image model.

  • What is Pixtral 12B 2409's MMMU score?

    52.5%.

  • What is the text decoder based on?

    Mistral AI Nemo (12B), providing the text generation and instruction-following capabilities.

  • How does Pixtral 12B 2409 compare to Pixtral Large?

    Pixtral Large (124B) is built on Mistral AI Large 2 and outperforms Pixtral 12B 2409 on document understanding, chart analysis, and mathematical vision tasks. Pixtral 12B 2409 is more accessible in inference cost at the 12B parameter count.