Skip to content

Llama 3.2 1B Instruct

Llama 3.2 1B Instruct is Meta's smallest openly available model, with a context window of 128K tokens. It delivers text generation, summarization, and tool calling with minimal memory and compute requirements.

index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'meta/llama-3.2-1b',
prompt: 'Why is the sky blue?'
})

Playground

Try out Llama 3.2 1B Instruct by Meta. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

About Llama 3.2 1B Instruct

Meta released Llama 3.2 1B Instruct on September 18, 2024 as the smallest model in the Llama 3.2 collection. The 1B supports a context window of 128K tokens and multi-capability coverage across summarization, instruction following, rewriting, language reasoning, and tool use. Llama 3.2 1B Instruct is competitive with similarly sized models from other families on these tasks.

The 1B is also the basis for Llama Guard 3 1B, a companion safety model derived from this checkpoint for content moderation in memory-constrained environments.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Amazon Bedrock
Legal:Terms
Privacy
128K
0.2s
88tps
$0.10/M$0.10/M
09/18/2024
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by Meta

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
131K
0.2s
44tps
$0.24/M$0.97/M
bedrock logo
deepinfra logo
04/05/2025
131K
0.2s
44tps
$0.17/M$0.66/M
bedrock logo
deepinfra logo
groq logo
04/05/2025
128K
0.1s
215tps
$0.59/M$0.72/M
bedrock logo
groq logo
12/06/2024
128K
0.3s
53tps
$0.15/M$0.15/M
bedrock logo
09/18/2024
131K
0.1s
52tps
$0.10/M$0.10/M
Read:$0.1/M
Write:
bedrock logo
cerebras logo
deepinfra logo
+2
07/23/2024
131K
0.2s
39tps
$0.72/M$0.72/M
bedrock logo
deepinfra logo
07/23/2024

What To Consider When Choosing a Provider

  • Configuration: Compare $0.1 and $0.1.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Llama 3.2 1B Instruct

Best For

  • Lightweight agentic apps: Summarization, instruction following, and tool calling
  • Prototyping and experimentation: Minimal-cost iteration before scaling to a larger model
  • High-throughput classification: Routing and tagging where per-token economics dominate

Consider Alternatives When

  • Higher reasoning quality: Needs exceed what a 1B model can deliver, so Llama 3.2 3B offers improved capability at still-modest size
  • Complex multi-step tasks: Coding assistance or nuanced instruction following is better suited to 8B or 70B models
  • Image understanding needed: The 1B is text-only, so Llama 3.2 11B is the smallest Llama model with vision support

Conclusion

Llama 3.2 1B Instruct is the smallest Llama model, suitable for summarization, tool calling, and instruction following at minimal per-token cost.

Frequently Asked Questions

  • What is the context window for a 1B model?

    128K tokens, covering summarization, instruction following, rewriting, and tool use.

  • Does Llama 3.2 1B Instruct support tool calling?

    Yes. Tool calling is one of the trained capabilities included alongside summarization, instruction following, and rewriting, making it suitable for lightweight agentic applications that need to invoke external actions.

  • Is there a companion safety model for the 1B?

    Yes. Llama Guard 3 1B was built from this checkpoint for content moderation in memory-constrained environments.

  • How does Llama 3.2 1B Instruct compare to Gemma at similar scales?

    Llama 3.2 1B Instruct matches Gemma on summarization, instruction following, and tool use. The 3B beats Gemma 2 2.6B on those same tasks by a wider margin.