Skip to content

Llama 4 Maverick 17B 128E Instruct FP8

Llama 4 Maverick 17B 128E Instruct FP8 is Meta's natively multimodal Mixture of Experts (MoE) model with 17B active parameters across 128 experts. Published benchmarks span image and text tasks, and the MoE activates a fraction of the parameters that comparable dense models use.

Tool UseVision (Image)
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'meta/llama-4-maverick',
prompt: 'Why is the sky blue?'
})

About Llama 4 Maverick 17B 128E Instruct FP8

Meta released Llama 4 Maverick 17B 128E Instruct FP8 on April 5, 2025 as one of the first two models in the Llama 4 generation. The collection is built around two architectural advances: native multimodality through early fusion, and Mixture of Experts (MoE). Llama 4 Maverick 17B 128E Instruct FP8 is the larger and more capable of the two initial releases, with 17 billion active parameters, 128 routed experts plus one shared expert, and 400 billion total parameters. Each token activates only 17B of those 400B parameters (the shared expert plus one routed expert). This makes inference substantially more efficient than a dense 400B model while preserving the quality benefits of the larger total parameter budget.

Llama 4's native multimodality represents a different architectural approach from the adapter-based vision in Llama 3.2. Rather than adding image understanding to an existing text backbone, Llama 4 treats text and vision tokens together from the beginning in a unified backbone. This enables more coherent cross-modal reasoning.

On the LMArena leaderboard, an experimental chat version of Llama 4 Maverick 17B 128E Instruct FP8 scored an Elo of 1417. Llama 4 Maverick 17B 128E Instruct FP8 exceeds comparable frontier models on coding, reasoning, multilingual, long-context, and image benchmarks. It achieves results comparable to other open-weight models on reasoning and coding at less than half the active parameters.