Llama 3.1 70B Instruct
Llama 3.1 70B Instruct introduced context of 131.1K tokens, eight-language multilingual support, and trained tool-use capability to the open 70B parameter class. This release defined Meta's approach to open frontier models.
import { streamText } from 'ai'
const result = streamText({ model: 'meta/llama-3.1-70b', prompt: 'Why is the sky blue?'})Playground
Try out Llama 3.1 70B Instruct by Meta. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by Meta
| Model |
|---|
About Llama 3.1 70B Instruct
Meta released Llama 3.1 70B Instruct on July 23, 2024. The 70B variant sat in the middle of a three-model release (8B, 70B, 405B) but carried the full 3.1 feature set: a context window of 131.1K tokens (16 times longer than Llama 3's limit), multilingual support across eight languages, and tool-use capability trained directly into the model rather than bolted on through prompting tricks.
The eight-language support (English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai) goes beyond surface-level translation. Llama 3.1 70B Instruct matched or beat peer open and closed models in its parameter class on more than 150 benchmark datasets spanning these languages. A single model can handle cross-language conversations without maintaining separate language-specific deployments.
Tool use was a first for the Llama family at this scale. The 3.1 generation was explicitly trained for agentic workflows: calling APIs, querying databases, and invoking custom functions as part of a reasoning chain. Meta's Llama Stack API, introduced alongside 3.1, standardized the interfaces for connecting the model to external toolchains and retrieval-augmented generation (RAG) systems.
Meta also expanded the open-weight license in 3.1. You can use model outputs to train or improve other models.
What To Consider When Choosing a Provider
- Configuration: For workloads that push the full context window of 131.1K tokens (processing entire codebases or book-length documents), provider differences in long-context throughput become significant. Benchmark with representative payloads before you commit. Compare $0.72 and $0.72.
- Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Llama 3.1 70B Instruct
Best For
- Long-context workloads: Processing legal documents, technical manuals, or entire codebases within the full window of 131.1K tokens in a single pass without chunking
- Multilingual applications: A single model handles conversations and instructions across English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai
- Agentic systems: Trained tool-use capability for calling external APIs, databases, and custom functions as part of multi-step reasoning
Consider Alternatives When
- Smaller model sufficient: A smaller, faster model satisfies quality needs, so Llama 3.1 8B is the lighter option within the same generation
- Vision required: Image understanding is required, so Llama 3.2 11B or 90B provide native multimodal input
- Refined 70B preferred: Maximum instruction-following quality at 70B is the priority and serving cost allows for a newer 70B release such as Llama 3.3 70B
Conclusion
Llama 3.1 70B Instruct proved open models could offer context of 131.1K tokens, multilingual capability, and trained tool use in a single package. It established the baseline that every subsequent Llama 70B has inherited.
Frequently Asked Questions
Why was the jump to context of 131.1K tokens significant in Llama 3.1 70B Instruct?
Llama 3 supported only 8K context. The 16x expansion to 131.1K tokens in Llama 3.1 70B Instruct lets you process entire codebases, book-length documents, and multi-hour transcripts in a single pass. This eliminates the chunking and retrieval complexity that shorter context windows require.
Which eight languages does Llama 3.1 70B Instruct support?
English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. Llama 3.1 70B Instruct was evaluated across more than 150 benchmark datasets spanning these languages.
What does trained tool use mean in practice?
The model calls external functions, APIs, and databases as part of its reasoning process. Unlike prompt-engineered tool use, this capability was learned during training, making function calls more reliable and less dependent on careful prompt construction.
What is Llama Stack and how does it relate to the 70B?
Llama Stack is Meta's standardized set of interfaces for retrieval-augmented generation and agentic application development. It was introduced alongside the 3.1 generation and provides a consistent integration layer for connecting the model to external tools and data sources.