Llama 3.3 70B Instruct
Llama 3.3 70B Instruct is Meta's refined text-only model. It targets 405B-class results at 70B serving cost, with improved instruction following and multilingual capability.
import { streamText } from 'ai'
const result = streamText({ model: 'meta/llama-3.3-70b', prompt: 'Why is the sky blue?'})Playground
Try out Llama 3.3 70B Instruct by Meta. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
About Llama 3.3 70B Instruct
Meta released Llama 3.3 70B Instruct on December 6, 2024 as the final model in its 2024 Llama release cadence. The 3.3 70B is text-only, but it represents a targeted refinement of the 70B tier. Llama 3.3 70B Instruct delivers similar performance to the 3.1 405B at a fraction of the serving cost.
The core improvements center on instruction following and multilingual capability. Instruction following (the model's ability to accurately execute detailed or constrained directions) is one of the most important capabilities in production deployments where system prompts encode complex behavioral rules. The multilingual improvements matter for enterprise applications serving global audiences: better handling of non-English instructions reduces the engineering overhead of maintaining separate language-specific prompts.
Llama Stack, which Meta standardized throughout 2024 as a set of interfaces for RAG and agentic applications, is fully compatible with the 3.3 70B. Teams already using Llama Stack distributions for toolchain orchestration can upgrade to the 3.3 generation without rearchitecting their integration layer.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by Meta
| Model |
|---|
What To Consider When Choosing a Provider
- Configuration: If you're migrating from Llama 3.1 70B, test your existing prompts against 3.3 70B before you switch. Improved instruction following can change output style enough that you'll adjust prompts. Compare $0.59 and $0.72.
- Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Llama 3.3 70B Instruct
Best For
- 405B quality at 70B cost: Applications that previously required the 405B for output quality but where serving a 400B+ parameter model is economically prohibitive
- Complex system prompts: Workloads depending on precise instruction following for customer support bots, structured data extraction, and multi-step reasoning chains
- Multilingual production deployments: Improved non-English instruction handling reduces prompt engineering overhead
- Upgrading from 3.1 70B: Teams that want clear quality gains without moving to a larger, more expensive model
Consider Alternatives When
- Vision required: Image understanding is part of the task and 3.3 70B is text-only, so Llama 3.2 90B handles multimodal inputs
- Maximum reasoning depth: Cost is not the constraint and Llama 3.1 405B remains the largest open model
- Native multimodal architecture: Rather than adapter-based vision, consider Llama 4 Maverick or Scout
Conclusion
Llama 3.3 70B Instruct is the practical high-capability choice for organizations that need 405B-level instruction quality at 70B serving economics. Improved instruction following makes it well-suited to production systems with complex behavioral specifications.
Frequently Asked Questions
What specifically improved in Llama 3.3 70B Instruct over Llama 3.1 70B?
Instruction following quality and multilingual capabilities. Llama 3.3 70B Instruct delivers performance comparable to the much larger 3.1 405B, with refinements in how the model handles detailed and constrained instructions.
Is Llama 3.3 70B Instruct a drop-in upgrade from 3.1 70B?
Architecturally, yes. But improved instruction following means outputs may differ in style or format compared to 3.1 70B for the same prompts. Run regression tests against existing prompts before switching production workloads.
Does Llama 3.3 70B Instruct support vision inputs?
No. It is a text-only model. For multimodal workflows at the 70B scale, Llama 3.2 90B (adapter-based vision) or Llama 4 Maverick (natively multimodal) are the appropriate alternatives.
How does Llama 3.3 70B Instruct relate to the broader Llama ecosystem tooling?
Fully compatible with Llama Stack distributions, which provide standardized interfaces for RAG and agentic application development.