Llama 3.2 90B Vision Instruct

Llama 3.2 90B Vision Instruct is Meta's highest-capability vision-language model at the Llama 3.2 launch. It pairs large-scale language generation with image reasoning, a context window of 128K tokens, and support for complex multi-element visual analysis.

Tool UseVision (Image)

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'meta/llama-3.2-90b',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

About Llama 3.2 90B Vision Instruct

Llama 3.2 90B Vision Instruct is Meta's largest open vision-language model, released on September 25, 2024. It's built on the Llama 3.1 70B language foundation with a cross-attention vision adapter connecting to a vision encoder. At 90B parameters, the language model component has substantially more capacity for complex reasoning, synthesis, and generation than the 11B variant. Tasks that pair difficult visual understanding with demanding text generation are better served here.

The context window of 128K tokens accommodates extended visual conversations: a sequence of images with questions, a technical document with figures and tables, or an annotated slide deck can all fit in a single context. For enterprise use cases like research document analysis, technical diagram interpretation, and medical image description, the combination of large language model capacity and vision capability is more practical than at the 11B scale.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Llama 3.2 90B Vision Instruct

About Llama 3.2 90B Vision Instruct