Meta released Llama 4 Maverick 17B 128E Instruct FP8 on April 5, 2025 as one of the first two models in the Llama 4 generation. The collection is built around two architectural advances: native multimodality through early fusion, and Mixture of Experts (MoE). Llama 4 Maverick 17B 128E Instruct FP8 is the larger and more capable of the two initial releases, with 17 billion active parameters, 128 routed experts plus one shared expert, and 400 billion total parameters. Each token activates only 17B of those 400B parameters (the shared expert plus one routed expert). This makes inference substantially more efficient than a dense 400B model while preserving the quality benefits of the larger total parameter budget.
Llama 4's native multimodality represents a different architectural approach from the adapter-based vision in Llama 3.2. Rather than adding image understanding to an existing text backbone, Llama 4 treats text and vision tokens together from the beginning in a unified backbone. This enables more coherent cross-modal reasoning.
On the LMArena leaderboard, an experimental chat version of Llama 4 Maverick 17B 128E Instruct FP8 scored an Elo of 1417. Llama 4 Maverick 17B 128E Instruct FP8 exceeds comparable frontier models on coding, reasoning, multilingual, long-context, and image benchmarks. It achieves results comparable to other open-weight models on reasoning and coding at less than half the active parameters.