GPT OSS 120B
GPT OSS 120B is OpenAI's open-source 20-billion parameter language model, providing a lightweight yet capable open-weights option suitable for cost-efficient deployment.
import { streamText } from 'ai'
const result = streamText({ model: 'openai/gpt-oss-20b', prompt: 'Why is the sky blue?'})Playground
Try out GPT OSS 120B by OpenAI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
About GPT OSS 120B
GPT OSS 120B became available on August 5, 2025 on AI Gateway alongside gpt-oss-120b as part of OpenAI's open-source model initiative. At 20 billion parameters, it is the more compact of the two releases, designed for scenarios where the full 120B model's resource requirements are impractical.
Despite its smaller size, GPT OSS 120B delivers meaningful language model capability for chat, content generation, summarization, and analysis tasks. Open weights make it inspectable and auditable, which is valuable for organizations with transparency requirements.
The model's compact size makes it a practical option for teams working with deployment patterns that don't scale to 120B+ parameters.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by OpenAI
| Model |
|---|
What To Consider When Choosing a Provider
- Configuration: At 20B parameters, GPT OSS 120B is much more practical to deploy on standard infrastructure compared to the 120B variant. It provides a good balance of capability and resource requirements.
- Configuration: Through AI Gateway you can use it immediately without provisioning GPU infrastructure.
- Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use GPT OSS 120B
Best For
- Lightweight open-source deployment: Open-weight models with reasonable infrastructure requirements
- Cost-efficient open-source: Applications that need open-weight transparency at lower compute cost
- Edge deployment research: Exploring deployment of capable models in resource-constrained environments
- General-purpose tasks: Chat, summarization, and content generation where 20B scale is sufficient
Consider Alternatives When
- Higher capability needed: Gpt-oss-120b for stronger open-source performance
- Maximum quality: GPT-5 or GPT-5.2 for higher capability from closed-source models
- Specialized tasks: Codex models for coding, o-series for reasoning
- Smallest possible model: GPT-5 nano or GPT-4.1 nano for minimal-cost inference
Conclusion
GPT OSS 120B provides a practical entry point to open-source language models from OpenAI, balancing capability with efficiency. Available through AI Gateway, it serves teams that need open weights without the infrastructure demands of larger models.
Frequently Asked Questions
How does GPT OSS 120B compare to gpt-oss-120b?
It's more compact (20B vs 120B parameters), making it cheaper to run and easier to self-host, with correspondingly lower capability on complex tasks.
What tasks can GPT OSS 120B handle?
Chat, content generation, summarization, analysis, and other general-purpose language tasks where 20B parameter scale provides sufficient quality.
What context window does GPT OSS 120B support?
131.1K tokens.
How does AI Gateway handle authentication for GPT OSS 120B?
AI Gateway accepts a single API key or OIDC token for all requests. You don't embed OpenAI credentials in your application; AI Gateway routes and authenticates on your behalf.
What are typical latency characteristics?
This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.