Skip to content
Dashboard

GPT OSS 20B

GPT OSS 20B is OpenAI's smaller open-weight model with roughly 21 billion total parameters and 3.6 billion active per token, designed for low-latency, agentic, and on-device workloads.

ReasoningTool Use
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'openai/gpt-oss-20b',
prompt: 'Why is the sky blue?'
})

Playground

Try out GPT OSS 20B by OpenAI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

openai logo
openai logo

Ask GPT OSS 20B anything to try it out.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Amazon Bedrock
128K
0.4s
59tps
$0.07/M$0.30/M
08/05/2025
Fireworks
128K
5.1s
32tps
$0.07/M$0.30/M
Read:$0.04/M
Write:
+1
08/05/2025
Groq
131K
0.8s
778tps
$0.07/M$0.30/M
Read:$0.04/M
Write:
+1
08/05/2025
DeepInfra
131K
0.2s
113tps
$0.03/M$0.14/M
08/05/2025
Together AI
131K
0.3s
138tps
$0.05/M$0.20/M
08/05/2025
Novita AI
131K
0.7s
115tps
$0.04/M$0.15/M
08/05/2025
Parasail
131K
$0.04/M$0.20/M
08/05/2025
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by OpenAI

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
3.0s
59tps
$5.00/M
$30.00/M
Read:
$0.5/M
Write:
$10.00/K
+ input costs
+4
azure logo
bedrock logo
openai logo
04/24/2026
400K
1.7s
130tps
$0.75/M$4.50/M
Read:$0.07/M
Write:
$10.00/K
+ input costs
+4
azure logo
openai logo
03/17/2026
1.1M
2.9s
79tps
$2.50/M
$15.00/M
Read:
$0.25/M
Write:
$10.00/K
+ input costs
+4
azure logo
openai logo
03/05/2026
400K
4.7s
127tps
$0.25/M$2.00/M
Read:$0.03/M
Write:
$14/K
+ input costs
+4
azure logo
openai logo
08/07/2025
131K
0.2s
266tps
$0.35/M$0.75/M
Read:$0.25/M
Write:
baseten logo
bedrock logo
cerebras logo
+5
08/05/2025
128K
0.7s
59tps
$0.15/M$0.60/M
Read:$0.07/M
Write:
$14/K
+ input costs
+3
azure logo
openai logo
07/18/2024

About GPT OSS 20B

GPT OSS 20B was released by OpenAI on August 5, 2025 under the Apache 2.0 license, alongside the larger gpt-oss-120b. Both are mixture-of-experts transformers using alternating dense and locally banded sparse attention, grouped multi-query attention, and rotary positional embeddings with native support for 131.1K tokens of context.

The 20B label refers to total parameter count — about 21 billion. Only roughly 3.6 billion parameters activate per token, which is what determines inference cost. OpenAI reports GPT OSS 20B matches or exceeds o3-mini on common evaluations and outperforms it on competition math (AIME) and HealthBench, while running on a single device with 16 GB of memory.

GPT OSS 20B supports adjustable reasoning levels (low, mid, high), native function calling, and structured outputs. OpenAI positions it as the recommended starting point for most workloads, with gpt-oss-120b available to escalate to on the hardest reasoning steps.

Through AI Gateway, you reach GPT OSS 20B with a single API key, route to bedrock, fireworks, groq, deepinfra, togetherai, novita, parasail as needed, and read live throughput and latency from this page. No GPU provisioning, no separate provider account.

What To Consider When Choosing a Provider

  • Configuration: GPT OSS 20B is a mixture-of-experts model — roughly 21 billion total parameters with about 3.6 billion active per token. That makes it inexpensive to serve and a sensible default for most workloads. AI Gateway routes requests to it as a managed API, so you can use it without standing up your own inference stack.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use GPT OSS 20B

Best For

  • Agentic and tool-using flows: Native function calling and adjustable reasoning levels suit multi-step agents
  • High-volume general workloads: Roughly o3-mini-level quality at low active-parameter cost
  • Open-weight requirements: Apache 2.0 license permits inspection, deployment, and redistribution
  • Long-context tasks: 131.1K tokens of context for document-heavy or transcript-heavy work

Consider Alternatives When

  • Hardest reasoning tasks: gpt-oss-120b activates more parameters per token and approaches o4-mini on core benchmarks
  • Frontier closed-source quality: GPT-5 and GPT-5.1 for the strongest proprietary capability
  • Coding-specific work: Codex models are tuned for software engineering workflows
  • Vision or multimodal inputs: GPT OSS 20B is text-only; use GPT-4o or GPT-5 family models for images

Conclusion

GPT OSS 20B is the smaller of the two open-weight OpenAI releases, but it is not a strictly weaker option — its low active-parameter cost, agentic features, and o3-mini-level benchmark numbers make it a reasonable default for most workloads. Use it through AI Gateway when you want open weights without managing infrastructure.