What is Mixture of Experts (MoE) and how does it work in Llama 4 Maverick 17B 128E Instruct FP8?

Each input token activates only a subset of the total parameters. Llama 4 Maverick 17B 128E Instruct FP8 uses alternating dense and MoE layers. MoE layers route each token to a shared expert plus one of 128 routed experts. Only 17B of the 400B total parameters are active per token, reducing inference cost while the full parameter budget contributes to model quality.

What does "natively multimodal" mean compared to the adapter-based vision in Llama 3.2?

Llama 3.2 added vision to an existing text backbone via cross-attention adapters, keeping language model weights frozen. Llama 4 Maverick 17B 128E Instruct FP8 processes text and vision tokens together in a unified backbone. This enables deeper cross-modal reasoning because the model was never strictly text-only.

What Elo score did Llama 4 Maverick 17B 128E Instruct FP8 achieve on LMArena?

An experimental chat version of Llama 4 Maverick 17B 128E Instruct FP8 scored an Elo of 1417 on LMArena.

What languages does Llama 4 support?

Llama 4 supports 200 languages, including over 100 with more than 1 billion tokens each, representing 10x more multilingual coverage than Llama 3.

Llama 4 Maverick 17B 128E Instruct FP8

Llama 4 Maverick 17B 128E Instruct FP8 is Meta's natively multimodal Mixture of Experts (MoE) model with 17B active parameters across 128 experts. Published benchmarks span image and text tasks, and the MoE activates a fraction of the parameters that comparable dense models use.

Tool UseVision (Image)

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'meta/llama-4-maverick',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Frequently Asked Questions

What is Mixture of Experts (MoE) and how does it work in Llama 4 Maverick 17B 128E Instruct FP8?
Each input token activates only a subset of the total parameters. Llama 4 Maverick 17B 128E Instruct FP8 uses alternating dense and MoE layers. MoE layers route each token to a shared expert plus one of 128 routed experts. Only 17B of the 400B total parameters are active per token, reducing inference cost while the full parameter budget contributes to model quality.
What does "natively multimodal" mean compared to the adapter-based vision in Llama 3.2?
Llama 3.2 added vision to an existing text backbone via cross-attention adapters, keeping language model weights frozen. Llama 4 Maverick 17B 128E Instruct FP8 processes text and vision tokens together in a unified backbone. This enables deeper cross-modal reasoning because the model was never strictly text-only.
What Elo score did Llama 4 Maverick 17B 128E Instruct FP8 achieve on LMArena?
An experimental chat version of Llama 4 Maverick 17B 128E Instruct FP8 scored an Elo of 1417 on LMArena.
What languages does Llama 4 support?
Llama 4 supports 200 languages, including over 100 with more than 1 billion tokens each, representing 10x more multilingual coverage than Llama 3.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Llama 4 Maverick 17B 128E Instruct FP8

Frequently Asked Questions