GPT OSS 20B
GPT OSS 20B is OpenAI's smaller open-weight model with roughly 21 billion total parameters and 3.6 billion active per token, designed for low-latency, agentic, and on-device workloads.
import { streamText } from 'ai'
const result = streamText({ model: 'openai/gpt-oss-20b', prompt: 'Why is the sky blue?'})Playground
Try out GPT OSS 20B by OpenAI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by OpenAI
| Model |
|---|
About GPT OSS 20B
GPT OSS 20B was released by OpenAI on August 5, 2025 under the Apache 2.0 license, alongside the larger gpt-oss-120b. Both are mixture-of-experts transformers using alternating dense and locally banded sparse attention, grouped multi-query attention, and rotary positional embeddings with native support for 131.1K tokens of context.
The 20B label refers to total parameter count — about 21 billion. Only roughly 3.6 billion parameters activate per token, which is what determines inference cost. OpenAI reports GPT OSS 20B matches or exceeds o3-mini on common evaluations and outperforms it on competition math (AIME) and HealthBench, while running on a single device with 16 GB of memory.
GPT OSS 20B supports adjustable reasoning levels (low, mid, high), native function calling, and structured outputs. OpenAI positions it as the recommended starting point for most workloads, with gpt-oss-120b available to escalate to on the hardest reasoning steps.
Through AI Gateway, you reach GPT OSS 20B with a single API key, route to bedrock, fireworks, groq, deepinfra, togetherai, novita, parasail as needed, and read live throughput and latency from this page. No GPU provisioning, no separate provider account.
What To Consider When Choosing a Provider
- Configuration: GPT OSS 20B is a mixture-of-experts model — roughly 21 billion total parameters with about 3.6 billion active per token. That makes it inexpensive to serve and a sensible default for most workloads. AI Gateway routes requests to it as a managed API, so you can use it without standing up your own inference stack.
- Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use GPT OSS 20B
Best For
- Agentic and tool-using flows: Native function calling and adjustable reasoning levels suit multi-step agents
- High-volume general workloads: Roughly o3-mini-level quality at low active-parameter cost
- Open-weight requirements: Apache 2.0 license permits inspection, deployment, and redistribution
- Long-context tasks: 131.1K tokens of context for document-heavy or transcript-heavy work
Consider Alternatives When
- Hardest reasoning tasks:
gpt-oss-120bactivates more parameters per token and approaches o4-mini on core benchmarks - Frontier closed-source quality: GPT-5 and GPT-5.1 for the strongest proprietary capability
- Coding-specific work: Codex models are tuned for software engineering workflows
- Vision or multimodal inputs: GPT OSS 20B is text-only; use GPT-4o or GPT-5 family models for images
Conclusion
GPT OSS 20B is the smaller of the two open-weight OpenAI releases, but it is not a strictly weaker option — its low active-parameter cost, agentic features, and o3-mini-level benchmark numbers make it a reasonable default for most workloads. Use it through AI Gateway when you want open weights without managing infrastructure.
Frequently Asked Questions
How does GPT OSS 20B compare to
gpt-oss-120b?Both share the same MoE architecture, 131.1K tokens context, and Apache 2.0 license. GPT OSS 20B activates fewer parameters per token (about 3.6 billion versus 5.1 billion), runs on a 16 GB device, and matches o3-mini on common benchmarks.
gpt-oss-120bapproaches o4-mini on harder reasoning tasks but costs more to serve.What hardware is GPT OSS 20B designed for?
OpenAI designed it to run on a single device with 16 GB of memory, including consumer-grade hardware. Weights ship natively quantized in MXFP4. When used through AI Gateway you don't manage hardware — requests route to bedrock, fireworks, groq, deepinfra, togetherai, novita, parasail.
Does GPT OSS 20B support tool calling and structured outputs?
Yes. GPT OSS 20B supports native function calling, structured outputs, and long chain-of-thought reasoning. You can also select a reasoning level (low, mid, or high) per request, similar to the o-series.
What license is GPT OSS 20B released under?
Apache 2.0. You can inspect, deploy, fine-tune, and redistribute the weights without OpenAI licensing restrictions. Fine-tuning happens outside AI Gateway since AI Gateway serves models as a managed API.
What context window does GPT OSS 20B support?
131.1K tokens, sufficient for long documents, multi-turn agentic sessions, and transcript-heavy workloads.
Can I call GPT OSS 20B through the AI SDK?
Yes. GPT OSS 20B is available through AI Gateway via the AI SDK as well as Chat Completions, Responses, and Messages-compatible API formats. Use
openai/gpt-oss-20bas the model identifier.Is Zero Data Retention available for GPT OSS 20B?
Yes, Zero Data Retention is available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.