Skip to content

Llama 3.3 70B Instruct

Llama 3.3 70B Instruct is Meta's refined text-only model. It targets 405B-class results at 70B serving cost, with improved instruction following and multilingual capability.

Tool Use
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'meta/llama-3.3-70b',
prompt: 'Why is the sky blue?'
})

Playground

Try out Llama 3.3 70B Instruct by Meta. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

About Llama 3.3 70B Instruct

Meta released Llama 3.3 70B Instruct on December 6, 2024 as the final model in its 2024 Llama release cadence. The 3.3 70B is text-only, but it represents a targeted refinement of the 70B tier. Llama 3.3 70B Instruct delivers similar performance to the 3.1 405B at a fraction of the serving cost.

The core improvements center on instruction following and multilingual capability. Instruction following (the model's ability to accurately execute detailed or constrained directions) is one of the most important capabilities in production deployments where system prompts encode complex behavioral rules. The multilingual improvements matter for enterprise applications serving global audiences: better handling of non-English instructions reduces the engineering overhead of maintaining separate language-specific prompts.

Llama Stack, which Meta standardized throughout 2024 as a set of interfaces for RAG and agentic applications, is fully compatible with the 3.3 70B. Teams already using Llama Stack distributions for toolchain orchestration can upgrade to the 3.3 generation without rearchitecting their integration layer.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Amazon Bedrock
Legal:Terms
Privacy
128K
0.2s
194tps
$0.72/M$0.72/M
12/06/2024
Groq
Legal:Terms
Privacy
128K
0.1s
$0.59/M$0.79/M
12/06/2024
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by Meta

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
131K
0.2s
26tps
$0.24/M$0.97/M
bedrock logo
deepinfra logo
04/05/2025
131K
0.2s
200tps
$0.17/M$0.66/M
bedrock logo
deepinfra logo
groq logo
04/05/2025
128K
0.2s
182tps
$0.16/M$0.16/M
bedrock logo
09/25/2024
128K
0.3s
52tps
$0.15/M$0.15/M
bedrock logo
09/18/2024
131K
0.1s
95tps
$0.10/M$0.10/M
Read:$0.1/M
Write:
bedrock logo
cerebras logo
deepinfra logo
+2
07/23/2024
131K
0.3s
32tps
$0.72/M$0.72/M
bedrock logo
deepinfra logo
07/23/2024

What To Consider When Choosing a Provider

  • Configuration: If you're migrating from Llama 3.1 70B, test your existing prompts against 3.3 70B before you switch. Improved instruction following can change output style enough that you'll adjust prompts. Compare $0.59 and $0.72.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Llama 3.3 70B Instruct

Best For

  • 405B quality at 70B cost: Applications that previously required the 405B for output quality but where serving a 400B+ parameter model is economically prohibitive
  • Complex system prompts: Workloads depending on precise instruction following for customer support bots, structured data extraction, and multi-step reasoning chains
  • Multilingual production deployments: Improved non-English instruction handling reduces prompt engineering overhead
  • Upgrading from 3.1 70B: Teams that want clear quality gains without moving to a larger, more expensive model

Consider Alternatives When

  • Vision required: Image understanding is part of the task and 3.3 70B is text-only, so Llama 3.2 90B handles multimodal inputs
  • Maximum reasoning depth: Cost is not the constraint and Llama 3.1 405B remains the largest open model
  • Native multimodal architecture: Rather than adapter-based vision, consider Llama 4 Maverick or Scout

Conclusion

Llama 3.3 70B Instruct is the practical high-capability choice for organizations that need 405B-level instruction quality at 70B serving economics. Improved instruction following makes it well-suited to production systems with complex behavioral specifications.

Frequently Asked Questions

  • What specifically improved in Llama 3.3 70B Instruct over Llama 3.1 70B?

    Instruction following quality and multilingual capabilities. Llama 3.3 70B Instruct delivers performance comparable to the much larger 3.1 405B, with refinements in how the model handles detailed and constrained instructions.

  • Is Llama 3.3 70B Instruct a drop-in upgrade from 3.1 70B?

    Architecturally, yes. But improved instruction following means outputs may differ in style or format compared to 3.1 70B for the same prompts. Run regression tests against existing prompts before switching production workloads.

  • Does Llama 3.3 70B Instruct support vision inputs?

    No. It is a text-only model. For multimodal workflows at the 70B scale, Llama 3.2 90B (adapter-based vision) or Llama 4 Maverick (natively multimodal) are the appropriate alternatives.

  • How does Llama 3.3 70B Instruct relate to the broader Llama ecosystem tooling?

    Fully compatible with Llama Stack distributions, which provide standardized interfaces for RAG and agentic application development.