Llama 3.2 11B Vision Instruct

Llama 3.2 11B Vision Instruct is Meta's entry point for vision-language capability in the Llama 3.2 family. This 11B parameter model adds image understanding through a cross-attention adapter, making it an accessible starting point for multimodal applications.

Tool UseVision (Image)

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'meta/llama-3.2-11b',
  prompt: 'Why is the sky blue?'
})

Overview About Providers Throughput Latency Uptime Status Similar FAQ

More models by Meta

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

meta/llama-4-maverick

131K

0.2s

152tps

$0.24/M

$0.97/M

—

04/05/2025

meta/llama-4-scout

131K

0.2s

164tps

$0.17/M

$0.66/M

—

04/05/2025

meta/llama-3.3-70b

128K

0.2s

187tps

$0.59/M

$0.72/M

—

12/06/2024

meta/llama-3.2-3b

128K

0.2s

54tps

$0.15/M

—

09/25/2024

meta/llama-3.1-70b

131K

0.3s

41tps

$0.72/M

—

07/23/2024

meta/llama-3.1-8b

131K

0.1s

168tps

$0.02/M

$0.05/M

Read:$0.03/M

Write:—

—

07/23/2024

Agent Stack

Core Platform

Tools

Learn

Build

Explore

Llama 3.2 11B Vision Instruct

More models by Meta