Llama 3.2 90B Vision Instruct
Llama 3.2 90B Vision Instruct is Meta's highest-capability vision-language model at the Llama 3.2 launch. It pairs large-scale language generation with image reasoning, a context window of 128K tokens, and support for complex multi-element visual analysis.
import { streamText } from 'ai'
const result = streamText({ model: 'meta/llama-3.2-90b', prompt: 'Why is the sky blue?'})Frequently Asked Questions
What makes Llama 3.2 90B Vision Instruct more capable than Llama 3.2 11B for vision tasks?
The 90B language model foundation has more capacity for complex reasoning, varied language generation, and difficult multi-element visual scenes. The gap shows most on tasks that combine intricate image understanding with long-form text generation, not simple visual QA.
How does the context window of 128K tokens work with image inputs?
Images are encoded as token sequences and consume context budget. A long conversation with multiple images and extended text history can be held in the window of 128K tokens, though very high-resolution images encode to more tokens than low-resolution ones.
What types of documents and images does Llama 3.2 90B Vision Instruct handle best?
Technical documents with figures, research papers with charts and equations, medical imaging contexts, and complex multi-element scenes benefit most from the 90B-scale reasoning capacity. Standard photography and simple diagrams can typically be handled by the 11B variant.
When was Llama 3.2 90B Vision Instruct released?
Meta released Llama 3.2 90B Vision Instruct on September 25, 2024.
How do I use Llama 3.2 90B Vision Instruct on AI Gateway?
Use the identifier
meta/llama-3.2-90bwith any supported interface. You can send image inputs alongside text in the same request. Send the request through AI Gateway; it routes providers and fails over automatically.