Meta released Llama 4 Scout 17B 16E Instruct on April 5, 2025 alongside Llama 4 Maverick as one of the founding models of the Llama 4 generation. Llama 4 Scout 17B 16E Instruct is a 17-billion-active-parameter Mixture of Experts model with 16 experts and 109 billion total parameters. It's substantially leaner in total parameter count than Maverick's 400B. Like Maverick, Llama 4 Scout 17B 16E Instruct was built with native multimodality across text, image, and video frame data.
Llama 4 Scout 17B 16E Instruct's defining characteristic is its context length. Meta extended context from 128K tokens in Llama 3 to 131.1K tokens in Llama 4 Scout 17B 16E Instruct, about a 78x increase. The architecture enabling this is iRoPE (interleaved Rotary Position Embeddings): most layers use standard RoPE, but the model also interleaves attention layers without positional embeddings. Inference-time temperature scaling of attention further enhances length generalization. Llama 4 Scout 17B 16E Instruct was validated with needle-in-a-haystack retrieval tests and cumulative negative log-likelihood evaluations over 131.1K tokens of code.
A context of 131.1K tokens can hold roughly 7.5 million words of plain text, the equivalent of approximately 25 full-length novels, or a large enterprise codebase with all source files, documentation, and test suites loaded together. Use cases include multi-document summarization across a large corpus, parsing extensive user activity logs, and reasoning over entire codebases in a single prompt without chunking or retrieval-augmented generation (RAG). This last capability is particularly notable for software development tooling, where RAG-based approaches introduce retrieval errors and context fragmentation.
Llama 4 Scout 17B 16E Instruct delivers better results across a broad range of benchmarks in its class. It also supports image grounding (aligning user prompts with specific visual regions) and exceeds prior Llama models on coding, reasoning, long context, and image benchmarks.