Pixtral 12B 2409
Pixtral 12B 2409 is a natively multimodal model with a 400M vision encoder and context window of 128K tokens, processing images at native resolution with support for multiple images per request.
import { streamText } from 'ai'
const result = streamText({ model: 'mistral/pixtral-12b', prompt: 'Why is the sky blue?'})What To Consider When Choosing a Provider
Zero Data Retention
AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.Authentication
AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
Pixtral 12B 2409 processes images at native resolution without token waste. Pixtral 12B 2409 adapts token allocation to actual image dimensions, unlike models that resize all images to a fixed token budget.
When to Use Pixtral 12B 2409
Best For
Document question answering:
Text and visual layout matter together
Chart and graph interpretation:
Requiring both visual parsing and textual explanation
Multi-image workflows:
Within a single request using the full context of 128K tokens
Native multimodal migration:
For applications moving from vision-adapted text models
Apache 2.0 vision licensing:
Teams requiring this license for a vision-capable model
Consider Alternatives When
Higher vision accuracy:
You need top-tier performance (consider Pixtral Large)
Text-only workloads:
Vision capability adds no value
Higher math or reasoning:
You need stronger benchmark scores alongside vision
Conclusion
Pixtral 12B 2409 brought native multimodality to the Mistral AI family as a ground-up integration of vision and language through a 400M encoder trained alongside the text decoder. Mistral AI has deprecated Pixtral 12B 2409. For teams with existing integrations, Pixtral 12B 2409 remains available through AI Gateway.
FAQ
Mistral AI has designated Pixtral 12B 2409 as deprecated in favor of newer vision models. Pixtral 12B 2409 remains accessible through AI Gateway for existing integrations.
Pixtral 12B 2409 dynamically allocates tokens based on actual image dimensions rather than resizing to a fixed budget. This preserves detail in high-resolution charts, documents, and schematics.
Multiple images are supported within the context window of 128K tokens, limited by total token budget.
400 million parameters, trained from scratch on visual data, not adapted from a pre-existing image model.
52.5%.
Mistral AI Nemo (12B), providing the text generation and instruction-following capabilities.
Pixtral Large (124B) is built on Mistral AI Large 2 and outperforms Pixtral 12B 2409 on document understanding, chart analysis, and mathematical vision tasks. Pixtral 12B 2409 is more accessible in inference cost at the 12B parameter count.