Pixtral 12B 2409
Pixtral 12B 2409 is a natively multimodal model with a 400M vision encoder and context window of 128K tokens, processing images at native resolution with support for multiple images per request.
import { streamText } from 'ai'
const result = streamText({ model: 'mistral/pixtral-12b', prompt: 'Why is the sky blue?'})Frequently Asked Questions
Is Pixtral 12B 2409 still actively maintained?
Mistral AI has designated Pixtral 12B 2409 as deprecated in favor of newer vision models. Pixtral 12B 2409 remains accessible through AI Gateway for existing integrations.
How does Pixtral 12B 2409 handle images at native resolution?
Pixtral 12B 2409 dynamically allocates tokens based on actual image dimensions rather than resizing to a fixed budget. This preserves detail in high-resolution charts, documents, and schematics.
How many images can Pixtral 12B 2409 process in one request?
Multiple images are supported within the context window of 128K tokens, limited by total token budget.
What is the vision encoder architecture?
400 million parameters, trained from scratch on visual data, not adapted from a pre-existing image model.
What is Pixtral 12B 2409's MMMU score?
52.5%.
What is the text decoder based on?
Mistral AI Nemo (12B), providing the text generation and instruction-following capabilities.
How does Pixtral 12B 2409 compare to Pixtral Large?
Pixtral Large (124B) is built on Mistral AI Large 2 and outperforms Pixtral 12B 2409 on document understanding, chart analysis, and mathematical vision tasks. Pixtral 12B 2409 is more accessible in inference cost at the 12B parameter count.