Skip to content

GPT OSS 120B

GPT OSS 120B is OpenAI's open-source 20-billion parameter language model, providing a lightweight yet capable open-weights option suitable for cost-efficient deployment.

ReasoningTool Use
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'openai/gpt-oss-20b',
prompt: 'Why is the sky blue?'
})

Playground

Try out GPT OSS 120B by OpenAI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

About GPT OSS 120B

GPT OSS 120B became available on August 5, 2025 on AI Gateway alongside gpt-oss-120b as part of OpenAI's open-source model initiative. At 20 billion parameters, it is the more compact of the two releases, designed for scenarios where the full 120B model's resource requirements are impractical.

Despite its smaller size, GPT OSS 120B delivers meaningful language model capability for chat, content generation, summarization, and analysis tasks. Open weights make it inspectable and auditable, which is valuable for organizations with transparency requirements.

The model's compact size makes it a practical option for teams working with deployment patterns that don't scale to 120B+ parameters.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Amazon Bedrock
Legal:Terms
Privacy
128K
0.3s
230tps
$0.07/M$0.30/M
08/05/2025
Fireworks
Legal:Terms
Privacy
128K
1.0s
56tps
$0.07/M$0.30/M
Read:$0.04/M
Write:
08/05/2025
Groq
Legal:Terms
Privacy
131K
0.1s
$0.07/M$0.30/M
Read:$0.04/M
Write:
08/05/2025
DeepInfra
Legal:Terms
Privacy
131K
0.2s
95tps
$0.03/M$0.14/M
08/05/2025
Together AI
Legal:Terms
Privacy
131K
0.5s
113tps
$0.05/M$0.20/M
08/05/2025
Novita AI
Legal:Terms
Privacy
131K
0.5s
100tps
$0.04/M$0.15/M
08/05/2025
Parasail
Legal:Terms
Privacy
131K
0.4s
77tps
$0.04/M$0.20/M
08/05/2025
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by OpenAI

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
3.6s
83tps
$5.00/M
$30.00/M
Read:
$0.5/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
04/24/2026
400K
2.8s
217tps
$0.75/M$4.50/M
Read:$0.07/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
03/17/2026
400K
0.8s
65tps
$0.20/M$1.25/M
Read:$0.02/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
03/17/2026
1.1M
0.7s
68tps
$2.50/M
$15.00/M
Read:
$0.25/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
03/05/2026
128K
1.2s
78tps
$1.25/M$10.00/M
Read:$0.13/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
11/12/2025
131K
0.2s
506tps
$0.35/M$0.75/M
Read:$0.25/M
Write:
baseten logo
bedrock logo
cerebras logo
+5
08/05/2025

What To Consider When Choosing a Provider

  • Configuration: At 20B parameters, GPT OSS 120B is much more practical to deploy on standard infrastructure compared to the 120B variant. It provides a good balance of capability and resource requirements.
  • Configuration: Through AI Gateway you can use it immediately without provisioning GPU infrastructure.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use GPT OSS 120B

Best For

  • Lightweight open-source deployment: Open-weight models with reasonable infrastructure requirements
  • Cost-efficient open-source: Applications that need open-weight transparency at lower compute cost
  • Edge deployment research: Exploring deployment of capable models in resource-constrained environments
  • General-purpose tasks: Chat, summarization, and content generation where 20B scale is sufficient

Consider Alternatives When

  • Higher capability needed: Gpt-oss-120b for stronger open-source performance
  • Maximum quality: GPT-5 or GPT-5.2 for higher capability from closed-source models
  • Specialized tasks: Codex models for coding, o-series for reasoning
  • Smallest possible model: GPT-5 nano or GPT-4.1 nano for minimal-cost inference

Conclusion

GPT OSS 120B provides a practical entry point to open-source language models from OpenAI, balancing capability with efficiency. Available through AI Gateway, it serves teams that need open weights without the infrastructure demands of larger models.

Frequently Asked Questions

  • How does GPT OSS 120B compare to gpt-oss-120b?

    It's more compact (20B vs 120B parameters), making it cheaper to run and easier to self-host, with correspondingly lower capability on complex tasks.

  • What tasks can GPT OSS 120B handle?

    Chat, content generation, summarization, analysis, and other general-purpose language tasks where 20B parameter scale provides sufficient quality.

  • What context window does GPT OSS 120B support?

    131.1K tokens.

  • How does AI Gateway handle authentication for GPT OSS 120B?

    AI Gateway accepts a single API key or OIDC token for all requests. You don't embed OpenAI credentials in your application; AI Gateway routes and authenticates on your behalf.

  • What are typical latency characteristics?

    This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.