Skip to content

Llama 3.2 90B Vision Instruct

Llama 3.2 90B Vision Instruct is Meta's highest-capability vision-language model at the Llama 3.2 launch. It pairs large-scale language generation with image reasoning, a context window of 128K tokens, and support for complex multi-element visual analysis.

Tool UseVision (Image)
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'meta/llama-3.2-90b',
prompt: 'Why is the sky blue?'
})

More models by Meta

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
131K
0.2s
19tps
$0.24/M$0.97/M
bedrock logo
deepinfra logo
04/05/2025
131K
0.2s
207tps
$0.17/M$0.66/M
bedrock logo
deepinfra logo
groq logo
04/05/2025
128K
0.2s
143tps
$0.59/M$0.72/M
bedrock logo
groq logo
12/06/2024
128K
0.3s
53tps
$0.15/M$0.15/M
bedrock logo
09/18/2024
131K
0.1s
150tps
$0.10/M$0.10/M
Read:$0.1/M
Write:
bedrock logo
cerebras logo
deepinfra logo
+2
07/23/2024
131K
0.3s
32tps
$0.72/M$0.72/M
bedrock logo
deepinfra logo
07/23/2024